r - 如何使用 R 中的加权(调查)数据制作漂亮的无边界地理专题/热图,可能对点观测使用空间平滑

标签 r map survey nearest-neighbor smoothing

自从 Joshua Katz 发表这些 dialect maps您可以找到 all over the web使用 harvard's dialect survey ,我一直在尝试复制和概括他的方法......但其中大部分都超出了我的范围。 josh 公开了他的一些方法 in this poster ,但是(据我所知)还没有透露他的任何代码。

我的目标是概括这些方法,以便任何美国政府主要调查数据集的用户都可以轻松地将他们的加权数据放入一个函数中并获得合理的地理 map 。地理位置各不相同:一些调查数据集有 ZCTA,一些有县,一些有州,一些有都会区等。从在质心处绘制每个点开始可能是明智的 - 质心被讨论 here并且适用于 the census bureau's 2010 gazetteer files 中的大多数地理区域.因此,对于每个调查数据点,您在 map 上都有一个点。但是一些调查回复的权重为 10,其他的权重为 100,000!显然,最终出现在 map 上的任何“热度”、平滑或着色都需要考虑不同的权重。

我擅长调查数据,但我对空间平滑或核估计一无所知。 josh 在他的海报中使用的方法是 k-nearest neighbor kernel smoothing with gaussian kernel这对我来说很陌生。我是映射的新手,但如果我知道目标应该是什么,我通常可以使事情正常进行。

注意:这个问题与 a question asked ten months ago that no longer contains available data 非常相似.还有花絮信息on this thread ,但如果有人有一个聪明的方式来回答我的确切问题,我显然宁愿看到那个。

r 调查包有一个 svyplot函数,如果你运行这些代码行,你可以看到笛卡尔坐标的加权数据。但实际上,对于我想做的事情,绘图需要覆盖在 map 上。

library(survey)
data(api)
dstrat<-svydesign(id=~1,strata=~stype, weights=~pw, data=apistrat, fpc=~fpc)
svyplot(api00~api99, design=dstrat, style="bubble")

万一它有任何用处,我已经发布了一些示例代码,这些示例代码将为愿意帮助我的任何人提供一种快速方法,以从基于核心的统计区域(另一种地理类型)的一些调查数据开始。

任何想法、建议、指导将不胜感激(如果我能获得为 http://asdfree.com/ 编写的正式教程/指南/操作方法,我将表示感谢)

谢谢!!!!!!!!!!
# load a few mapping libraries
library(rgdal)
library(maptools)
library(PBSmapping)


# specify some population data to download
mydata <- "http://www.census.gov/popest/data/metro/totals/2012/tables/CBSA-EST2012-01.csv"

# load mydata
x <- read.csv( mydata , skip = 9 , h = F )

# keep only the GEOID and the 2010 population estimate
x <- x[ , c( 'V1' , 'V6' ) ]

# name the GEOID column to match the CBSA shapefile
# and name the weight column the weight column!
names( x ) <- c( 'GEOID10' , "weight" )

# throw out the bottom few rows
x <- x[ 1:950 , ]

# convert the weight column to numeric
x$weight <- as.numeric( gsub( ',' , '' , as.character( x$weight ) ) )

# now just make some fake trinary data
x$trinary <- c( rep( 0:2 , 316 ) , 0:1 )

# simple tabulation
table( x$trinary )

# so now the `x` data file looks like this:
head( x )

# and say we just wanted to map
# something easy like
# 0=red, 1=green, 2=blue,
# weighted simply by the population of the cbsa

# # # end of data read-in # # #


# # # shapefile read-in? # # #

# specify the tiger file to download
tiger <- "ftp://ftp2.census.gov/geo/tiger/TIGER2010/CBSA/2010/tl_2010_us_cbsa10.zip"

# create a temporary file and a temporary directory
tf <- tempfile() ; td <- tempdir()

# download the tiger file to the local disk
download.file( tiger , tf , mode = 'wb' )

# unzip the tiger file into the temporary directory
z <- unzip( tf , exdir = td )

# isolate the file that ends with ".shp"
shapefile <- z[ grep( 'shp$' , z ) ]

# read the shapefile into working memory
cbsa.map <- readShapeSpatial( shapefile )

# remove CBSAs ending with alaska, hawaii, and puerto rico
cbsa.map <- cbsa.map[ !grepl( "AK$|HI$|PR$" , cbsa.map$NAME10 ) , ]

# cbsa.map$NAME10 now has a length of 933
length( cbsa.map$NAME10 )

# convert the cbsa.map shapefile into polygons..
cbsa.ps <- SpatialPolygons2PolySet( cbsa.map )

# but for some reason, cbsa.ps has 966 shapes??
nrow( unique( cbsa.ps[ , 1:2 ] ) )
# that seems wrong, but i'm not sure how to fix it?

# calculate the centroids of each CBSA
cbsa.centroids <- calcCentroid(cbsa.ps)
# (ignoring the fact that i'm doing something else wrong..because there's 966 shapes for 933 CBSAs?)

# # # # # # as far as i can get w/ mapping # # # #


# so now you've got
# the weighted data file `x` with the `GEOID10` field
# the shapefile with the matching `GEOID10` field
# the centroids of each location on the map


# can this be mapped nicely?

最佳答案

我不确定我对空间平滑能有多大帮助,因为这是我几乎没有经验的任务,但我花了一些时间在 R 中制作 map ,所以我希望我在下面添加的内容对那部分有所帮助你的问题。

我已经开始在 # # # shapefile read-in # # # 编辑您的代码;你会注意到我把 map 保存在 SpatialPolygonsDataFrame 中类(class)和我依赖 rastergstat包来构建网格并运行空间平滑。空间平滑模型是我最不满意的部分,但该过程允许我制作栅格并演示如何屏蔽、投影和绘制它。

library(rgdal)
library(raster)
library(gstat)

# read in a base map
m <- getData("GADM", country="United States", level=1)
m <- m[!m$NAME_1 %in% c("Alaska","Hawaii"),]

# specify the tiger file to download
tiger <- "ftp://ftp2.census.gov/geo/tiger/TIGER2010/CBSA/2010/tl_2010_us_cbsa10.zip"

# create a temporary file and a temporary directory
tf <- tempfile() ; td <- tempdir()

# download the tiger file to the local disk
download.file( tiger , tf , mode = 'wb' )

# unzip the tiger file into the temporary directory
z <- unzip( tf , exdir = td )

# isolate the file that ends with ".shp"
shapefile <- z[ grep( 'shp$' , z ) ]

# read the shapefile into working memory
cbsa.map <- readOGR( shapefile, layer="tl_2010_us_cbsa10" )

# remove CBSAs ending with alaska, hawaii, and puerto rico
cbsa.map <- cbsa.map[ !grepl( "AK$|HI$|PR$" , cbsa.map$NAME10 ) , ]

# cbsa.map$NAME10 now has a length of 933
length( cbsa.map$NAME10 )

# extract centroid for each CBSA
cbsa.centroids <- data.frame(coordinates(cbsa.map), cbsa.map$GEOID10)
names(cbsa.centroids) <- c("lon","lat","GEOID10")

# add lat lon to popualtion data
nrow(x)
x <- merge(x, cbsa.centroids, by="GEOID10")
nrow(x) # centroids could not be assigned to all records for some reason

# create a raster object
r <- raster(nrow=500, ncol=500, 
            xmn=bbox(m)["x","min"], xmx=bbox(m)["x","max"],
            ymn=bbox(m)["y","min"], ymx=bbox(m)["y","max"],
            crs=proj4string(m))

# run inverse distance weighted model - modified code from ?interpolate...needs more research
model <- gstat(id = "trinary", formula = trinary~1, weights = "weight", locations = ~lon+lat, data = x,
               nmax = 7, set=list(idp = 0.5))
r <- interpolate(r, model, xyNames=c("lon","lat"))
r <- mask(r, m) # discard interpolated values outside the states

# project map for plotting (optional)
# North America Lambert Conformal Conic
nalcc <- CRS("+proj=lcc +lat_1=20 +lat_2=60 +lat_0=40 +lon_0=-96 +x_0=0 +y_0=0 +ellps=GRS80 +datum=NAD83 +units=m +no_defs")
m <- spTransform(m, nalcc)
r <- projectRaster(r, crs=nalcc)

# plot map
par(mar=c(0,0,0,0), bty="n")
cols <- c(rgb(0.9,0.8,0.8), rgb(0.9,0.4,0.3),
          rgb(0.8,0.8,0.9), rgb(0.4,0.6,0.9),
          rgb(0.8,0.9,0.8), rgb(0.4,0.9,0.6))
col.ramp <- colorRampPalette(cols) # custom colour ramp
plot(r, axes=FALSE, legend=FALSE, col=col.ramp(100))
plot(m, add=TRUE) # overlay base map
legend("right", pch=22, pt.bg=cols[c(2,4,6)], legend=c(0,1,2), bty="n")

enter image description here

关于r - 如何使用 R 中的加权(调查)数据制作漂亮的无边界地理专题/热图,可能对点观测使用空间平滑,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/23318243/

相关文章:

r - 仅包含预测变量的数据框的虚拟变量

java - 适用于交互式移动 map 的语言/框架

r - 在 R 中为 Bing 情感词典添加新词

c++ - 在C++中使用<string,string>或<string,char>映射而不是<char,char>

map - golang map 比较 : partial match

r - 使用调查权重时,如何为 logit 模型生成边际效应?

r - 循环以使用 ifelse 添加新列

r - R中没有FPC的调查加权回归

r - 在 R 数据框中按组应用计算

r - 如何从Rscript弹出图形窗口?