r - 无法从维基百科中抓取表格

我无法理解 this question 的选定答案.我要抓取的表格是 this list of U.S. state populations .

library(XML)
theurl <- "http://en.wikipedia.org/wiki/List_of_U.S._states_and_territories_by_population"
tables <- readHTMLTable(theurl)
n.rows <- unlist(lapply(tables, function(t) dim(t)[1]))

这是我遇到的错误..

Error: failed to load external entity "http://en.wikipedia.org/wiki/List_of_U.S._states_and_territories_by_population"

什么给了？

(注意 - 尽管我正在寻求解决此错误，但如果您能指出一种更简单的获取人口数据的方法，我将不胜感激。)

最佳答案

您的代码没有任何问题。但是，您的网址有问题。

您可以通过进入 shell 并尝试验证代码的外部输入不会导致它失败来对此进行测试，例如，

curl https://en.wikipedia.org/wiki/List_of_U.S._states_and_territories_by_population

这将返回一个空体，类似于您的 R 代码。这应该会让您相信不是您的 R 代码有问题。做出这一发现后，您可以继续页面中您感兴趣的部分，再次使用 curl 中的免费且简单的测试环境，然后运行

curl https://en.wikipedia.org/wiki/List_of_U.S._states_and_territories_by_population#States_and_territories

绝对不会返回空结果:

...
<body class="mediawiki ltr sitedir-ltr ns-0 ns-subject page-List_of_U_S_states_and_territories_by_population skin-vector action-view">
    <div id="mw-page-base" class="noprint"></div>
    <div id="mw-head-base" class="noprint"></div>
    <div id="content" class="mw-body" role="main">

关于r - 无法从维基百科中抓取表格，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/32342919/

r - 无法从维基百科中抓取表格

上一篇：sql - 使用 T-SQL 对 XML 文档进行聚合查询

下一篇：xml - 德尔福 2007 xsd 导入