我有以下 XML
<?xml version="1.0" encoding="UTF-8"?>
<wddxPacket version="1.0">
<header />
<data>
<string>
<char code="0d" />
<char code="0a" />
Provider: HERO - 2.xx
<char code="0d" />
<char code="0a" />
<char code="0d" />
<char code="0a" />
<char code="0d" />
<char code="0a" />
DBvendor=EPA
<char code="0d" />
<char code="0a" />
Text-encoding=UTF-8
<char code="0d" />
<char code="0a" />
<char code="0d" />
<char code="0a" />
TY - RPRT
<char code="0d" />
<char code="0a" />
LB - 94742
<char code="0d" />
<char code="0a" />
AU - IARC,
<char code="0d" />
<char code="0a" />
LU - International Agency for Research on Cancer
<char code="0d" />
<char code="0a" />
PY - 1985
<char code="0d" />
<char code="0a" />
TY - JOUR
<char code="0d" />
<char code="0a" />
LB - 94743
<char code="0d" />
<char code="0a" />
AU - Shamilov, T. A.
<char code="0d" />
<char code="0a" />
AU - Abasov, D. M.
<char code="0d" />
<char code="0a" />
PY - 1973
<char code="0d" />
<char code="0a" />
J2 - Med Tr Prom Ekol
<char code="0d" />
<char code="0a" />
T2 - Meditsina Truda i Promyshlennaya Ekologiya
<char code="0d" />
<char code="0a" />
JF - Meditsina Truda i Promyshlennaya Ekologiya
<char code="0d" />
<char code="0a" />
SP - 12-15
<char code="0d" />
<char code="0a" />
SN - ISSN 1026-9428
<char code="0d" />
<char code="0a" />
TI - Effect of allyl chloride on animals under experimental conditions
<char code="0d" />
<char code="0a" />
VL - 8
<char code="0d" />
<char code="0a" />
ER -
<char code="0d" />
<char code="0a" />
<char code="0d" />
<char code="0a" />
TY - JOUR
<char code="0d" />
<char code="0a" />
</string>
</data>
</wddxPacket>
如何解析它以获取文本?
Provider: HERO - 2.xx
DBvendor=EPA
Text-encoding=UTF-8
TY - RPRT
LB - 94742
AU - IARC,
我需要 TY 以后的文本(这是 RIS 格式文件),但如果我只能获取所有文本,我仍然可以管理。我在网上尝试过,但在那里找不到太多东西。我需要在 Java 中执行此操作。
我试过了
Document doc = null;
DocumentBuilderFactory dbf = null;
DocumentBuilder docBuild = null;
dbf = DocumentBuilderFactory.newInstance();
docBuild = dbf.newDocumentBuilder();
doc = docBuild.parse(file);
Node node = doc.getDocumentElement();
XPathFactory xfact = XPathFactory.newInstance();
XPath xpath = xfact.newXPath();
String xpathStr = "/wddxPacket/header/";
Object res = xpath.evaluate(xpathStr, doc, XPathConstants.NODESET);
NodeList nodeList = (NodeList) res;
但我什么也没得到。
最佳答案
您需要 xpath : //string/text()
来获取文本值。
以下 java 代码将为您提供文本值列表。
DocumentBuilderFactory dbf = DocumentBuilderFactory.newInstance();
DocumentBuilder db = dbf.newDocumentBuilder();
Document doc = db.parse( new File( file ) );
XPathFactory xPathFactory = XPathFactory.newInstance();
XPath xpath = xPathFactory.newXPath();
XPathExpression expr = xpath.compile( "//string/text()");
Object eval = expr.evaluate( doc, XPathConstants.NODESET );
List<String> textValues = new ArrayList<String>();
if ( eval != null && eval instanceof NodeList )
{
NodeList list = (NodeList)eval;
for ( int i = 0 ; i < list.getLength(); i++ )
{
Node node = list.item(i);
String text = node.getNodeValue().trim();
if ( !text.isEmpty() )
{
System.out.println( text );
textValues.add( text );
}
}
}
文本值收集在变量textValues()
中。
关于java - 在 Java 中解析 XML 以仅获取文本,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/44167076/