查看encoding/xml包的源代码,所有解码逻辑(解码实际的XML节点并对其进行类型化)都在unmarshal中,调用它的唯一方法本质上是通过调用DecodeElement。然而,解码逻辑本质上也会搜索下一个 EndElement。造成这种情况的主要原因似乎是验证。然而,这对我来说似乎代表了一个主要的设计缺陷:如果我有一个巨大的 XML 文件,我对其结构有足够的信心,并且我只想一次解码一个节点,以便我可以有效地过滤,该怎么办通过即时数据? RawToken() 调用可用于获取当前标签,这很棒,但是,显然,当您对其调用 DecodeElement() 时,当不可避免的 unmarshal() 调用明显开始以以下方式运行到节点时,会出现错误:它被认为是不平衡的。
理论上似乎有可能遇到一个我想要解码的标记,捕获偏移量,解码元素,返回到先前的位置,然后循环,但这仍然会导致大量不必要的处理.
有没有办法一次只解析一个节点?
最佳答案
您所描述的称为 XML 流解析,因为它是由任何 SAX 解析器完成的。好消息:encoding/xml
支持这一点,尽管它有点隐藏。
您实际上需要做的是创建 xml.Decoder
的实例,并传递 io.Reader
。然后,您将使用 Decoder.Token()
读取输入流,直到找到下一个有效 xml token 。从那里,您可以决定下一步该做什么。
这是一个小例子also available as gist ,或者您可以 Run it on PlayGround :
package main
import (
"bytes"
"encoding/xml"
"fmt"
)
const (
book = `<?xml version="1.0" encoding="UTF-8"?>
<book>
<preface>Lorem ipsum dolor sit amet, consectetur adipisicing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam, quis nostrud exercitation ullamco laboris nisi ut aliquip ex ea commodo consequat. Duis aute irure dolor in reprehenderit in voluptate velit esse cillum dolore eu fugiat nulla pariatur. Excepteur sint occaecat cupidatat non proident, sunt in culpa qui officia deserunt mollit anim id est laborum.</preface>
<chapter num="1" title="Foo">Lorem ipsum dolor sit amet, consectetur adipisicing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam, quis nostrud exercitation ullamco laboris nisi ut aliquip ex ea commodo consequat. Duis aute irure dolor in reprehenderit in voluptate velit esse cillum dolore eu fugiat nulla pariatur. Excepteur sint occaecat cupidatat non proident, sunt in culpa qui officia deserunt mollit anim id est laborum.</chapter>
<chapter num="2" title="Bar">Lorem ipsum dolor sit amet, consectetur adipisicing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam, quis nostrud exercitation ullamco laboris nisi ut aliquip ex ea commodo consequat. Duis aute irure dolor in reprehenderit in voluptate velit esse cillum dolore eu fugiat nulla pariatur. Excepteur sint occaecat cupidatat non proident, sunt in culpa qui officia deserunt mollit anim id est laborum.</chapter>
</book>`
)
type Chapter struct {
Num int `xml:"num,attr"`
Title string `xml:"title,attr"`
Content string `xml:",chardata"`
}
func main() {
// We emulate a file or network stream
b := bytes.NewBufferString(book)
// And set up a decoder
d := xml.NewDecoder(b)
for {
// We look for the next token
// Note that this only reads until the next positively identified
// XML token in the stream
t, err := d.Token()
if err != nil {
break
}
switch et := t.(type) {
case xml.StartElement:
// We now have to inspect wether we are interested in the element
// otherwise we will advance
if et.Name.Local == "chapter" {
// Most often/likely element first
c := &Chapter{}
// We decode the element into(automagically advancing the stream)
// If no matching token is found, there will be an error
// Note the search only happens within the parent.
if err := d.DecodeElement(&c, &et); err != nil {
panic(err)
}
// We have found what we are interested in, so we print it
fmt.Printf("%d: %s\n", c.Num, c.Title)
} else if et.Name.Local == "book" {
fmt.Println("Book begins!")
}
case xml.EndElement:
if et.Name.Local != "book" {
continue
}
fmt.Println("Finished processing book!")
}
}
}
关于xml - Go:一次仅解码一个 XML 节点,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/34958199/