javascript - 递归 xml 解析函数未按预期工作

标签 javascript node.js xml recursion xml-parsing

我正在尝试解析 XML 文档并使用数据构建以下形式的(更简单的)json 对象:

{id: '1', name: 'content-types', children: [{id: '2', name: 'requirements': children: [... and so on ...]]}

我的 XML 具有如下所示的 Node (我只包含一个 - 它们可以任意嵌套):

<?xml version="1.0" encoding="utf-8"?>
<html xmlns="http://www.w3.org/1999/xhtml">
  <head/>
  <body class="taxonomies">
    <div class="taxonomy">
      <span class="id">3484069771</span>
      <span class="name">Content Types</span>
      <span class="locale">en</span>
      <div class="concepts">
        <div class="concept">
          <span class="id">3484058507</span>
          <span class="name">Promotional Publications</span>
          <div class="concepts">
            <div class="concept">
              <span class="id">3551765771</span>
              <span class="name">Datasheets</span>
            </div>
          </div>
        </div>
      </div>
    </div>
  </body>
</html>

我使用以下代码从 XML 构建 JSON 树:

buildConceptTree: function(xml){
    const doc = new dom().parseFromString(xml)
    var tree = []
    var selector = "//*[@class='taxonomies']"
    var count = 0  // this should keep track of the depth of the node being used
    function recurse(s, odd){
        var nodes
        console.log(count)
        console.log(s)
        var arr = []

        nodes = xpath.select(s, doc)
        nodes.forEach(node => {
            try {
                var children = node.childNodes
                var keys = Object.keys(children).filter(x => {return Number(x)})
                keys.forEach(key => {
                    var child = children[key]
                    console.log('child is: ')
                    console.log(child)
                    var obj = {}
                    var grandchildren = child.childNodes
                    var grandkeys = Object.keys(grandchildren).filter(x => {return Number(x)})

                    grandkeys.forEach(gk => {
                        var gc = grandchildren[gk]
                        try {
                            var nodevalue = gc['attributes'][0]['nodeValue']
                            switch(nodevalue){
                            case 'id':
                                obj['id'] = gc['textContent']
                            case 'name':
                                obj['name'] = gc['textContent']
                            case 'concepts':
                                count++
                                var rx = /taxonomy/
                                    if(!rx.test(s)){
                                        s = s+"/*[@class='taxonomy']"
                                    }
                                else{
                                    s = s
                                }
                                if (!odd){
                                    s += "/*[@class='concepts']"
                                }
                                else {
                                    s += "/*[@class='concept']"
                                }
                                odd = !odd
                                obj['children'] = recurse(s, odd)
                            }
                        }
                        catch(e){
                        }
                    })
                    arr.push(obj)
                })
            }
            catch(e){
            }

        })
        return arr


    }

    var tree = recurse(selector, false)
    return tree

},

就目前情况而言,该函数生成类似于我提到的 JSON 形式的内容,但缺少许多 Node 。

此外,我的递归函数似乎并没有在最简单的情况下终止,因为它沿着 xml 树的更深分支递归。我在控制台中记录了以下内容(例如),但没有 191 度深的 Node :

    191
     parser.js?d3c4:83 //*[@class='taxonomies']/*[@class='taxonomy']/*[@class='concepts']/*[@class='concept']/*[@class='concepts']/*[@class='concept']/*[@class='concepts']/*[@class='concept']/*[@class='concepts']/*[@class='concept']/*[@class='concepts']/*[@class='concept']/*[@class='concepts']/*[@class='concept']/*[@class='concepts']/*[@class='concept']/*[@class='concepts']/*[@class='concept']/*[@class='concepts']/*[@class='concept']/*[@class='concepts']/*[@class='concept']/*[@class='concepts']/*[@class='concept']/*[@class='concepts']/*[@class='concept']/*[@class='concepts']/*[@class='concept']/*[@class='concepts']/*[@class='concept']/*[@class='concepts']
     parser.js?d3c4:92 child is:

任何人都可以帮我弄清楚如何更改此函数以使其获取我想要的数据吗?

最佳答案

我可能错过了一些要求,但是一旦您停止循环所有元素并开始查询您期望的确切元素,问题似乎就不那么复杂了:

// Parse the xml string to a document
const parser = new DOMParser();
const xmlDoc = parser.parseFromString(
  getXML(),
  "text/xml"
);

// The main logic to go from an xml element to an object
const parseTaxonomy = (taxonomy, id = 1) => ({
  id,
  name: taxonomy.querySelector(".name")
    .innerText
    .toLowerCase()
    .replace(/\s/g, "-"),
  children: Array.from(
    (taxonomy.querySelector(".concepts") || { children: [] })
      .children
  ).map(t => parseTaxonomy(t, ++id)) // Note the ++
});

// Run on the first taxonomy
// If the top level contains multiple elements, use .map
console.log(
  parseTaxonomy(
    xmlDoc.querySelector(".taxonomy")
  )
);

// The data
function getXML() {
  return `<?xml version="1.0" encoding="utf-8"?>
<html xmlns="http://www.w3.org/1999/xhtml">
  <head/>
  <body class="taxonomies">
    <div class="taxonomy">
      <span class="id">3484069771</span>
      <span class="name">Content Types</span>
      <span class="locale">en</span>
      <div class="concepts">
        <div class="concept">
          <span class="id">3484058507</span>
          <span class="name">Promotional Publications</span>
          <div class="concepts">
            <div class="concept">
              <span class="id">3551765771</span>
              <span class="name">Datasheets</span>
            </div>
          </div>
        </div>
      </div>
    </div>
  </body>
</html>`;
};

注意:我更改了您放置评论的部分,因为评论未关闭,并且我希望它在子分类法周围有另一个包装。

关于javascript - 递归 xml 解析函数未按预期工作,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/51986159/

相关文章:

xml - powershell xml 排序节点和 replacechild

javascript - 不使用模板引擎时如何在 Express.js View 中包含 CSS 文件?

javascript - 如何在 Node 的嵌套函数中使用回调?

ruby-on-rails - 用 Nokogiri 解析然后保存到数据库?

javascript - Package-lock.json 以奇怪的结构构建并且大小增加了一倍

javascript - 如何为我的整个包启用严格模式但不为依赖启用?

xml - 根据子节点过滤 XML

javascript - Jquery 切换标签/按钮文本

javascript - 无法将我的计时器值存储在 cookie 中

javascript - 如何保存网页以继续离线工作?