r - 如何使用数据科学工具箱对简单地址进行地理编码

标签 r maps geocoding

我厌倦了谷歌的地理编码,并决定尝试另一种方法。数据科学工具包 (http://www.datasciencetoolkit.org) 允许您对无限数量的地址进行地理编码。 R 有一个出色的包,可用作其功能的包装器 (CRAN:RDSTK)。该软件包有一个名为 street2coordinates() 的函数。与数据科学工具包的地理编码实用程序接口(interface)。
但是,RDSTK 函数 street2coordinates()如果您尝试对诸如城市、国家/地区之类的简单内容进行地理编码,则不起作用。在以下示例中,我将尝试使用该函数获取凤凰城的经纬度:

> require("RDSTK")
> street2coordinates("Phoenix+Arizona+United+States")
[1] full.address
<0 rows> (or 0-length row.names)
数据科学工具包中的实用程序完美运行。这是给出答案的 URL 请求:
http://www.datasciencetoolkit.org/maps/api/geocode/json?sensor=false&address=Phoenix+Arizona+United+States
我对地理编码多个地址(完整的地址和城市名称)感兴趣。我知道 Data Science Toolkit URL 会很好用。
如何与 URL 交互并将多个纬度和经度获取到带有地址的数据框中?
这是一个示例数据集:
dff <- data.frame(address=c(
  "Birmingham, Alabama, United States",
  "Mobile, Alabama, United States",
  "Phoenix, Arizona, United States",
  "Tucson, Arizona, United States",
  "Little Rock, Arkansas, United States",
  "Berkeley, California, United States",
  "Duarte, California, United States",
  "Encinitas, California, United States",
  "La Jolla, California, United States",
  "Los Angeles, California, United States",
  "Orange, California, United States",
  "Redwood City, California, United States",
  "Sacramento, California, United States",
  "San Francisco, California, United States",
  "Stanford, California, United States",
  "Hartford, Connecticut, United States",
  "New Haven, Connecticut, United States"
  ))

最佳答案

像这样:

library(httr)
library(rjson)

data <- paste0("[",paste(paste0("\"",dff$address,"\""),collapse=","),"]")
url  <- "http://www.datasciencetoolkit.org/street2coordinates"
response <- POST(url,body=data)
json     <- fromJSON(content(response,type="text"))
geocode  <- do.call(rbind,sapply(json,
                                 function(x) c(long=x$longitude,lat=x$latitude)))
geocode
#                                                long      lat
# San Francisco, California, United States -117.88536 35.18713
# Mobile, Alabama, United States            -88.10318 30.70114
# La Jolla, California, United States      -117.87645 33.85751
# Duarte, California, United States        -118.29866 33.78659
# Little Rock, Arkansas, United States      -91.20736 33.60892
# Tucson, Arizona, United States           -110.97087 32.21798
# Redwood City, California, United States  -117.88536 35.18713
# New Haven, Connecticut, United States     -72.92751 41.36571
# Berkeley, California, United States      -122.29673 37.86058
# Hartford, Connecticut, United States      -72.76356 41.78516
# Sacramento, California, United States    -121.55541 38.38046
# Encinitas, California, United States     -116.84605 33.01693
# Birmingham, Alabama, United States        -86.80190 33.45641
# Stanford, California, United States      -122.16750 37.42509
# Orange, California, United States        -117.85311 33.78780
# Los Angeles, California, United States   -117.88536 35.18713
这利用了 street2coordinates API (documented here) 的 POST 接口(interface),它在 1 个请求中返回所有结果,而不是使用多个 GET 请求。
Phoenix 的缺失似乎是 street2coordinates API 中的一个错误。如果你去 API demo page 并尝试“凤凰城,亚利桑那州,美国”,你会得到一个空响应。但是,正如您的示例所示,使用他们的“Google 风格的地理编码器”确实会为 Phoenix 提供结果。所以这是一个使用重复 GET 请求的解决方案。请注意,这运行速度要慢得多。
geo.dsk <- function(addr){ # single address geocode with data sciences toolkit
  require(httr)
  require(rjson)
  url      <- "http://www.datasciencetoolkit.org/maps/api/geocode/json"
  response <- GET(url,query=list(sensor="FALSE",address=addr))
  json <- fromJSON(content(response,type="text"))
  loc  <- json['results'][[1]][[1]]$geometry$location
  return(c(address=addr,long=loc$lng, lat= loc$lat))
}
result <- do.call(rbind,lapply(as.character(dff$address),geo.dsk))
result <- data.frame(result)
result
#                                     address         long        lat
# 1        Birmingham, Alabama, United States   -86.801904  33.456412
# 2            Mobile, Alabama, United States   -88.103184  30.701142
# 3           Phoenix, Arizona, United States -112.0733333 33.4483333
# 4            Tucson, Arizona, United States  -110.970869  32.217975
# 5      Little Rock, Arkansas, United States   -91.207356  33.608922
# 6       Berkeley, California, United States   -122.29673  37.860576
# 7         Duarte, California, United States  -118.298662  33.786594
# 8      Encinitas, California, United States  -116.846046  33.016928
# 9       La Jolla, California, United States  -117.876447  33.857515
# 10   Los Angeles, California, United States  -117.885359  35.187133
# 11        Orange, California, United States  -117.853112  33.787795
# 12  Redwood City, California, United States  -117.885359  35.187133
# 13    Sacramento, California, United States  -121.555406  38.380456
# 14 San Francisco, California, United States  -117.885359  35.187133
# 15      Stanford, California, United States    -122.1675   37.42509
# 16     Hartford, Connecticut, United States   -72.763564   41.78516
# 17    New Haven, Connecticut, United States   -72.927507  41.365709

关于r - 如何使用数据科学工具箱对简单地址进行地理编码,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/22887833/

相关文章:

R 不使用 For 循环对数据进行子集化

ios - 等待地理编码循环完成

ios - 使用 MonoTouch 对地址进行地理编码

r - 将字符/字符串从 R 传递到 Fortran

r - 我正在尝试安装 package manipulate : message = not available for r 3. 1.1

r - 在 ggplot 中设置中断和标签

自定义 fragment (NPE)内的Android Maps V2 MapView

Javascript google maps api 输入地址显示位置

R S4 setMethod '[' 区分缺少的参数?

c++ - 开源分形图