r - 如何在给定纬度和经度/坐标的情况下自动确定状态?

标签 r geocoding

我有一个约 17,000 个 lat/lon 值的数据框,我希望使用它来获取和填充具有等效状态的新列。

到目前为止,我已经尝试了其他 Stack Overflow 答案中建议的几种解决方案(这里列出的太多,但超过十个),但没有一个对我有用。

我找到解决方案最接近的方法是使用 ggmap 包,但问题是我被警告我已超出限制,尽管只发送了一个 lat /lon 值。

我有单独的 latlon 值,甚至将它们组合成 lat,lon 格式,尽管如此,上述都没有解决方案对我有用。

我想要做的是根据给定的 lat/lon/coord 值确定状态并将状态保存在新列中(df$state )。

我最初匹配所有城市值是为了获得匹配的状态,但问题是由于多个州包含同名城市,匹配过程在第一次成功匹配后停止;结果,我发现自己拥有 2,800 多个属于 AK 的城市,尽管它们实际上相距数千英里。

任何建议都会很棒。

这是我数据的coordslatlon列的前100行:

structure(list(origin_coords = c("31.9618,-83.0588", "44.8782,-69.4718", 
"37.3894,-121.8868", "36.0485,-93.5044", "37.652,-120.7292", 
"33.7942,-84.2018", "32.0749,-81.0883", "31.0286,-97.6115", "40.7559,-111.8967", 
"39.8359,-91.7538", "35.922,-80.537", "39.8036,-75.0058", "43.072,-83.8424", 
"33.5207,-86.8025", "26.1216,-80.1288", "31.9618,-83.0588", "31.9618,-83.0588", 
"61.6303,-149.8181", "33.8687,-84.3351", "42.2196,-88.2426", 
"31.7943,-85.5581", "28.3067,-80.6862", "39.1157,-94.6271", "33.831,-85.7752", 
"39.2655,-76.4935", "32.9824,-87.7919", "61.6303,-149.8181", 
"31.086,-85.7192", "31.9618,-83.0588", "39.9048,-75.2946", "34.1132,-117.3771", 
"41.905,-71.1026", "42.3921,-97.4751", "31.2627,-86.7711", "42.5864,-71.4401", 
"33.7935,-93.807", "39.0097,-123.6523", "61.6303,-149.8181", 
"37.7235,-85.9769", "38.0624,-87.2452", "37.7166,-121.9226", 
"42.9993,-88.2196", "40.6316,-74.0927", "43.0892,-77.436", "39.8359,-91.7538", 
"38.5487,-89.5413", "35.833,-90.6965", "41.363,-89.0008", "37.7953,-95.9368", 
"33.4581,-83.0802", "33.7546,-93.6735", "32.7491,-96.4598", "41.8858,-87.6181", 
"40.7328,-74.0755", "31.2627,-86.7711", "31.9618,-83.0588", "61.6303,-149.8181", 
"38.4642,-85.7775", "40.6344,-92.9219", "37.8366,-89.1424", "42.5648,-83.0701", 
"39.5394,-76.3564", "33.8687,-84.3351", "41.4564,-90.7235", "42.0122,-87.8417", 
"38.8339,-104.8214", "36.4442,-92.5832", "39.838,-104.9988", 
"41.8378,-87.7602", "28.3051,-81.4242", "41.6052,-71.9808", "40.7808,-80.0592", 
"40.5364,-89.1885", "31.9618,-83.0588", "40.8915,-74.0119", "43.2078,-91.2976", 
"34.4574,-83.476", "36.4105,-92.1951", "40.0177,-75.2594", "36.0557,-96.0602", 
"44.694,-85.6763", "61.6303,-149.8181", "40.7446,-73.9345", "29.1989,-82.0874", 
"26.6048,-80.2149", "34.6909,-118.1491", "39.0289,-95.2086", 
"35.4074,-93.1355", "36.2523,-92.6907", "45.2097,-123.2043", 
"37.7953,-95.9368", "61.6303,-149.8181", "39.1157,-94.6271", 
"33.5793,-86.6375", "40.3757,-86.3201", "40.6344,-92.9219", "39.8359,-91.7538", 
"42.3921,-97.4751", "41.2564,-73.2111", "44.2767,-121.1896"), 
    origin_lat = c(31.9618, 44.8782, 37.3894, 36.0485, 37.652, 
    33.7942, 32.0749, 31.0286, 40.7559, 39.8359, 35.922, 39.8036, 
    43.072, 33.5207, 26.1216, 31.9618, 31.9618, 61.6303, 33.8687, 
    42.2196, 31.7943, 28.3067, 39.1157, 33.831, 39.2655, 32.9824, 
    61.6303, 31.086, 31.9618, 39.9048, 34.1132, 41.905, 42.3921, 
    31.2627, 42.5864, 33.7935, 39.0097, 61.6303, 37.7235, 38.0624, 
    37.7166, 42.9993, 40.6316, 43.0892, 39.8359, 38.5487, 35.833, 
    41.363, 37.7953, 33.4581, 33.7546, 32.7491, 41.8858, 40.7328, 
    31.2627, 31.9618, 61.6303, 38.4642, 40.6344, 37.8366, 42.5648, 
    39.5394, 33.8687, 41.4564, 42.0122, 38.8339, 36.4442, 39.838, 
    41.8378, 28.3051, 41.6052, 40.7808, 40.5364, 31.9618, 40.8915, 
    43.2078, 34.4574, 36.4105, 40.0177, 36.0557, 44.694, 61.6303, 
    40.7446, 29.1989, 26.6048, 34.6909, 39.0289, 35.4074, 36.2523, 
    45.2097, 37.7953, 61.6303, 39.1157, 33.5793, 40.3757, 40.6344, 
    39.8359, 42.3921, 41.2564, 44.2767), origin_lon = c(-83.0588, 
    -69.4718, -121.8868, -93.5044, -120.7292, -84.2018, -81.0883, 
    -97.6115, -111.8967, -91.7538, -80.537, -75.0058, -83.8424, 
    -86.8025, -80.1288, -83.0588, -83.0588, -149.8181, -84.3351, 
    -88.2426, -85.5581, -80.6862, -94.6271, -85.7752, -76.4935, 
    -87.7919, -149.8181, -85.7192, -83.0588, -75.2946, -117.3771, 
    -71.1026, -97.4751, -86.7711, -71.4401, -93.807, -123.6523, 
    -149.8181, -85.9769, -87.2452, -121.9226, -88.2196, -74.0927, 
    -77.436, -91.7538, -89.5413, -90.6965, -89.0008, -95.9368, 
    -83.0802, -93.6735, -96.4598, -87.6181, -74.0755, -86.7711, 
    -83.0588, -149.8181, -85.7775, -92.9219, -89.1424, -83.0701, 
    -76.3564, -84.3351, -90.7235, -87.8417, -104.8214, -92.5832, 
    -104.9988, -87.7602, -81.4242, -71.9808, -80.0592, -89.1885, 
    -83.0588, -74.0119, -91.2976, -83.476, -92.1951, -75.2594, 
    -96.0602, -85.6763, -149.8181, -73.9345, -82.0874, -80.2149, 
    -118.1491, -95.2086, -93.1355, -92.6907, -123.2043, -95.9368, 
    -149.8181, -94.6271, -86.6375, -86.3201, -92.9219, -91.7538, 
    -97.4751, -73.2111, -121.1896)), row.names = c(NA, 100L), class = "data.frame")

最佳答案

使用 sp 包中的 over 函数:

library(geojsonio)
library(sp)

# get usa polygon data
# http://eric.clst.org/tech/usgeojson/
usa <- geojson_read(
  "http://eric.clst.org/assets/wiki/uploads/Stuff/gz_2010_us_040_00_500k.json", 
  what = "sp"
)

df$state <- NA

# compare points
for (i in 1:nrow(df)) {
  coords <- c(df$origin_lon[i], df$origin_lat[i])
  if(any(is.na(coords))) next
  point <- sp::SpatialPoints(
    matrix(
      coords,
      nrow = 1
    )
  )
  sp::proj4string(point) <- sp::proj4string(usa)
  polygon_check <- sp::over(point, usa)
  df$state[i] <- as.character(polygon_check$NAME)
}

> head(df)
origin_coords origin_lat origin_lon      state
1  31.9618,-83.0588    31.9618   -83.0588    Georgia
2  44.8782,-69.4718    44.8782   -69.4718      Maine
3 37.3894,-121.8868    37.3894  -121.8868 California
4  36.0485,-93.5044    36.0485   -93.5044   Arkansas
5  37.652,-120.7292    37.6520  -120.7292 California
6  33.7942,-84.2018    33.7942   -84.2018    Georgia

关于r - 如何在给定纬度和经度/坐标的情况下自动确定状态?,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/51266193/

相关文章:

r - st_join 几何和分组列在一起

geolocation - 邮政编码的坐标列表

javascript - Mapbox Geolocation,如何加载到 map 中?

mysql - 哪个 FOSS RDBMS 用于地理空间数据?

geocoding - 对城市的区域名称进行地理编码以获取经度和经度

r - 在 centos(64 位)上安装 rJava。找不到 lpcre、llzma

r - color=factor(foo) 的 ggplot2 标签

r - 这种类型的情节可以用ggplot2完成吗?

r - 如何在 ggplot2 中根据聚合数据创建 fiddle 图?