我有一个大XML
文件,以下是其中的摘录:
...
<LexicalEntry id="Ait~ifAq_1">
<Lemma partOfSpeech="n" writtenForm="اِتِّفاق"/>
<Sense id="Ait~ifAq_1_tawaAfuq_n1AR" synset="tawaAfuq_n1AR"/>
<WordForm formType="root" writtenForm="وفق"/>
</LexicalEntry>
<LexicalEntry id="tawaA&um__1">
<Lemma partOfSpeech="n" writtenForm="تَوَاؤُم"/>
<Sense id="tawaA&um__1_AinosijaAm_n1AR" synset="AinosijaAm_n1AR"/>
<WordForm formType="root" writtenForm="وأم"/>
</LexicalEntry>
<LexicalEntry id="tanaAgum_2">
<Lemma partOfSpeech="n" writtenForm="تناغُم"/>
<Sense id="tanaAgum_2_AinosijaAm_n1AR" synset="AinosijaAm_n1AR"/>
<WordForm formType="root" writtenForm="نغم"/>
</LexicalEntry>
<Synset baseConcept="3" id="tawaAfuq_n1AR">
<SynsetRelations>
<SynsetRelation relType="hyponym" targets="AinosijaAm_n1AR"/>
<SynsetRelation relType="hyponym" targets="AinosijaAm_n1AR"/>
<SynsetRelation relType="hypernym" targets="ext_noun_NP_420"/>
</SynsetRelations>
<MonolingualExternalRefs>
<MonolingualExternalRef externalReference="13971065-n" externalSystem="PWN30"/>
</MonolingualExternalRefs>
</Synset>
...
我想从中提取特定信息。对于给定的writtenForm
是否来自<Lemma>
或<WordForm>
,程序取值synset
来自<Sense>
其中writtenForm
(相同 <LexicalEntry>
)并搜索所有值 id
的<Synset>
与 synset
具有相同的值来自<Sense>
。之后,程序给出了Synset
的所有关系。 ,即显示 relType
的值并返回<LexicalEntry>
并查找值 synset
的<Sense>
谁具有相同的值 targets
然后显示其 writtenForm
.
我觉得有点复杂,但结果应该是这样的:
اِتِّفاق hyponym تَوَاؤُم, اِنْسِجام
由于内存消耗,解决方案之一是使用 Stream 读取器。但我不知道我应该如何继续得到我想要的东西。请帮助我。
最佳答案
SAX 解析器与 DOM 解析器不同。它仅查看当前的 item
,而无法查看 future 的项目,直到它们成为当前的 item
。当 XML 文件非常大时,它是您可以使用的众多工具之一。相反,还有很多。仅举几例:
SAX
解析器DOM
解析器JDOM
解析器DOM4J
解析器STAX
解析器
您可以找到所有这些教程 here .
我认为学习后直接使用 DOM4J
或 JDOM
进行商业产品。
SAX 解析器的逻辑是您有一个 MyHandler
类,它扩展了 DefaultHandler
和 @Overrides
它的一些方法:
XML 文件:
<?xml version="1.0"?>
<class>
<student rollno="393">
<firstname>dinkar</firstname>
<lastname>kad</lastname>
<nickname>dinkar</nickname>
<marks>85</marks>
</student>
<student rollno="493">
<firstname>Vaneet</firstname>
<lastname>Gupta</lastname>
<nickname>vinni</nickname>
<marks>95</marks>
</student>
<student rollno="593">
<firstname>jasvir</firstname>
<lastname>singn</lastname>
<nickname>jazz</nickname>
<marks>90</marks>
</student>
</class>
处理程序类:
import org.xml.sax.Attributes;
import org.xml.sax.SAXException;
import org.xml.sax.helpers.DefaultHandler;
public class UserHandler extends DefaultHandler {
boolean bFirstName = false;
boolean bLastName = false;
boolean bNickName = false;
boolean bMarks = false;
@Override
public void startElement(String uri,
String localName, String qName, Attributes attributes)
throws SAXException {
if (qName.equalsIgnoreCase("student")) {
String rollNo = attributes.getValue("rollno");
System.out.println("Roll No : " + rollNo);
} else if (qName.equalsIgnoreCase("firstname")) {
bFirstName = true;
} else if (qName.equalsIgnoreCase("lastname")) {
bLastName = true;
} else if (qName.equalsIgnoreCase("nickname")) {
bNickName = true;
}
else if (qName.equalsIgnoreCase("marks")) {
bMarks = true;
}
}
@Override
public void endElement(String uri,
String localName, String qName) throws SAXException {
if (qName.equalsIgnoreCase("student")) {
System.out.println("End Element :" + qName);
}
}
@Override
public void characters(char ch[],
int start, int length) throws SAXException {
if (bFirstName) {
System.out.println("First Name: "
+ new String(ch, start, length));
bFirstName = false;
} else if (bLastName) {
System.out.println("Last Name: "
+ new String(ch, start, length));
bLastName = false;
} else if (bNickName) {
System.out.println("Nick Name: "
+ new String(ch, start, length));
bNickName = false;
} else if (bMarks) {
System.out.println("Marks: "
+ new String(ch, start, length));
bMarks = false;
}
}
}
主要类:
import java.io.File;
import javax.xml.parsers.SAXParser;
import javax.xml.parsers.SAXParserFactory;
import org.xml.sax.Attributes;
import org.xml.sax.SAXException;
import org.xml.sax.helpers.DefaultHandler;
public class SAXParserDemo {
public static void main(String[] args){
try {
File inputFile = new File("input.txt");
SAXParserFactory factory = SAXParserFactory.newInstance();
SAXParser saxParser = factory.newSAXParser();
UserHandler userhandler = new UserHandler();
saxParser.parse(inputFile, userhandler);
} catch (Exception e) {
e.printStackTrace();
}
}
}
关于java - 如何获取 XML 文件的特定信息,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/41285845/