我有一个巨大的 XML 文件,如下所示:
<?xml version="1.0"?>
<catalog>
<book id="bk101">
<author>Gambardella, Matthew</author>
<title>XML Developer's Guide</title>
</book>
<book id="bk102">
<author>Ralls, Kim</author>
<title>Midnight Rain</title>
</book>
[... one gazillion more entries ...]
</catalog>
我想以流的方式迭代这个文件,这样我就不必将整个文件加载到内存中,例如:
InputStream stream = new FileInputStream("gigantic-book-list.xml");
String nodeName = "book";
Iterator it = new StreamingXmlIterator(stream, nodeName);
Document bk101 = it.next();
Document bk102 = it.next();
此外,我希望它能够处理不同的 XML 输入文件,而无需创建特定对象(例如 Book.java)。
@McDowell 有一个很有前途的方法,使用 XMLStreamReader
和 StreamFilter
,地址为 https://stackoverflow.com/a/16799693/13365 ,但这仅提取单个节点。
此外,Camel's .tokenizeXML正是我想要的,所以我想我应该查看源代码。
最佳答案
@XmlRootElement
public class Book {
// TODO: getters/setters
public String author;
public String title;
}
假设您希望将数据作为强类型对象进行处理,您可以使用实用程序类型组合 StAX 和 JAXB:
class ContentFinder implements StreamFilter {
private boolean capture = false;
@Override
public boolean accept(XMLStreamReader xml) {
if (xml.isStartElement() && "book".equals(xml.getLocalName())) {
capture = true;
} else if (xml.isEndElement() && "book".equals(xml.getLocalName())) {
capture = false;
return true;
}
return capture;
}
}
class Limiter extends StreamReaderDelegate {
Limiter(XMLStreamReader xml) {
super(xml);
}
@Override
public boolean hasNext() throws XMLStreamException {
return !(getParent().isEndElement()
&& "book".equals(getParent().getLocalName()));
}
}
用法:
XMLInputFactory inFactory = XMLInputFactory.newFactory();
XMLStreamReader reader = inFactory.createXMLStreamReader(inputStream);
reader = inFactory.createFilteredReader(reader, new ContentFinder());
Unmarshaller unmar = JAXBContext.newInstance(Book.class)
.createUnmarshaller();
Transformer tformer = TransformerFactory.newInstance().newTransformer();
while (reader.hasNext()) {
XMLStreamReader limiter = new Limiter(reader);
Source src = new StAXSource(limiter);
DOMResult res = new DOMResult();
tformer.transform(src, res);
Book book = (Book) unmar.unmarshal(res.getNode());
System.out.println(book.title);
}
关于java - 如何以流式传输方式迭代巨大 XML 中的节点?,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/23676373/