xml - 在 R 中解析 iTunes RSS

我正在尝试在 R 中解析 iTunes 前 100 名并吐出艺术家、歌曲等，但我猜我遇到了 XML 文件问题。我能够通过 Billboard 的 RSS ( http://www1.billboard.com/rss/charts/hot-100 ) 轻松获取可用数据

GetBillboard <- function() {

  hot.100 <- xmlTreeParse("http://www1.billboard.com/rss/charts/hot-100")
  hot.100 <- xpathApply(xmlRoot(hot.100), "//item")

  top.songs <- character(length(hot.100))

  for(i in 1:length(hot.100)) {
    top.songs[i] <- xmlSApply(hot.100[[i]], xmlValue)[3]
  }
  return(top.songs)

}

尽管 ( https://itunes.apple.com/us/rss/topmusicvideos/limit=100/explicit=true/xml ) 尝试与 iTunes 类似的策略

GetITunes <- function() {
  itunes.raw <- getURL("https://itunes.apple.com/us/rss/topmusicvideos/limit=100/explicit=true/xml")
  itunes.xml <- xmlTreeParse(itunes.raw)
  top.vids <- xpathApply(xmlRoot(itunes.xml), "//entry")
  return(top.vids)
}

我只是胡说八道:

> m <- GetITunes()
> m
list()
attr(,"class")
[1] "XMLNodeSet"
>

我猜这是 XML 文件的格式。我怎样才能让这些 iTunes 数据落入与第一个函数此时来自 Billboard 的数据类似的结构中？

hot.100 <- xpathApply(xmlRoot(hot.100), "//item")

谢谢!

最佳答案

问题是您的 XML 文档有一个默认 namespace ，而您没有在 xpath 中考虑到这一点。不幸的是，当有默认 namespace 时，您需要在 xpath 中明确使用它。这应该有效

xpathApply(xmlRoot(itunes.xml), "//d:entry", 
    namespaces=c(d="http://www.w3.org/2005/Atom"))

这里我们任意选择 d 指向 XML 文档中使用的默认命名空间，然后在我们的 xpath 表达式中使用该前缀。

关于xml - 在 R 中解析 iTunes RSS，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/25517969/

xml - 在 R 中解析 iTunes RSS

上一篇：php - 如何在没有 SDO 的情况下加载架构、设置属性和输出字符串？

下一篇：java - 检索不同的子元素xml