xml - 具有简单和嵌套标签的 xml 的 clj-xpath

标签 xml xpath clojure

我有一个函数(仅内容),可以使用 clj-xpath 库从 xml 中提取内容。

(ns example
(:use  [clj-xpath.core]))

 (def data-url
"http://api.eventful.com/rest/events/search?app_key=4H4Vff4PdrTGp3vV&keywords=music&location=New+York&date=Future")

(defn xml-data [url] (slurp url))

(defn defxmldoc [url]
      (xml->doc (xml-data url)))

(defn contents-only [url root-tag tags] 
 (vec(map(fn [item]
         (into {}
              (map (fn [tag]
                     [tag ($x:text (str "./" (name tag))item)])tags)))
      (take 5 ($x root-tag (defxmldoc url))))))

函数调用如下所示

(contents-only data-url "/search/events/event" [:title :url])

当我尝试从嵌套标签中提取文本时,它可以很好地处理非嵌套标签。

<performers>
 <performer>
   <id>P0-001-000009049-1</id>
    <url>...</url>
    <name>Lindsey Buckingham</name>
    <short_bio>Rock</short_bio>
    <creator>TomAzoff</creator>
    <linker>evdb</linker>
</performer>

函数调用如下所示

(contents-only data-url "/search/events/event" [:title :url :name])

我收到 RuntimeException 错误,xpath(./name) clj-xpath.core/throwf (core.clj:26) 的 xml({:children...) 多于(或少于)1 个结果 (0)

如何更改我的仅内容函数,以便我也可以传递嵌套标签?

最佳答案

最快的方法:在 contents-only 函数中将 "./" 更改为 ".//"

user> (first (contents-only data-url "/search/events/event" [:title :id :name]))
{:title "Legally Blonde the Musical", :id "P0-001-000351944-7", :name "Legally Blonde The Musical"}
user> 

xpath documentation 中所述, .//name 将选择从当前节点开始的所有节点 name,无论在层次结构中的任何位置。

如果 name 不唯一,它可能不是您想要的,一种方法是在您指定的路径中明确,例如

(contents-only data-url "/search/events/event"
                [[:title]
                 [:performers :performer :id]
                 [:performers :performer :name]])

并拥有一些辅助函数,例如:

(defn build-path
  ([sep kys] (build-path nil sep kys))
  ([root sep kys]
   (->> kys (map name) (interpose sep) 
        (concat (when root (list root sep))) (apply str))))

(defn path
  "build a path from a collection"
  [t]
  (build-path "." \/ t))

user> (path [:performers :performer :id])
"./performers/performer/id"

(defn path-key
  "Transform [:a :b :c] into :a-b-c"
  [t]
  (->> t (build-path \-) keyword))

user> (path-key [:performers :performer :id])
:performers-performer-id

然后仅内容变为:

(defn contents-only2 [url root-tag tags]
  (vec (map(fn [item]
             (into {}
                   (map (fn [tag]
                          [(path-key tag) ($x:text (path tag) item)])
                        tags)))
           (take 5 ($x root-tag (defxmldoc url))))))

结果:

user> (first (contents-only2 data-url "/search/events/event"
                      [[:title]
                       [:performers :performer :id]
                       [:performers :performer :name]]))
{:title "Legally Blonde the Musical", :performers-performer-id "P0-001-000351944-7", :performers-performer-name "Legally Blonde The Musical"}
user> 

关于xml - 具有简单和嵌套标签的 xml 的 clj-xpath,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/28516252/

相关文章:

R 和 xml2 : how to read text that is not in children nodes and read information even if node is missing

clojure:使用循环正确地将东西传递给向量?

clojure - 如何在 clojure 中访问 c​​ore.cljs 中的 db.clj 方法

java - 使用 XSLT 提取 XML 而不必将整个 DOM 树读入内存?

xslt - xsl:value-of不起作用

python - 从 xpath 中删除信息?

tomcat - 如何在 lein-ring uberwar 中添加 META-INF/context.xml

xml - 从大型 Clojure 树结构中的惰性序列中删除元素,避免头部保留

c# - XmlSerializer 不序列化自定义类型

C# XML 序列化/反序列化