如何使用 Java 和 Apache POI XWPF 库从 .docx
文件中提取编号和文本?
我正在使用以下代码:
public static void readDocxFile() {
try {
File file = new File("C:\\test.docx");
FileInputStream fis = new FileInputStream(file.getAbsolutePath());
XWPFDocument document = new XWPFDocument(fis);
List<XWPFParagraph> paragraphs = document.getParagraphs();
for (XWPFParagraph para : paragraphs) {
System.out.println(para.getText());
fis.close();
}
} catch (Exception e) {
e.printStackTrace();
}
}
我的代码只是提取文本,如下所示:
CLIENT SERVICE SATISFACTION
Client Feedback System
Interlibrary Loans
Shelf Tidiness
Three Day Loans
Materials Availability Survey
Online help service
我需要提取文本中的章节编号(编号),如下所示:
1 CLIENT SERVICE SATISFACTION
1.1 Client Feedback System
1.1.1 Interlibrary Loans
1.1.2 Shelf Tidiness
1.1.3 Three Day Loans
1.2 Materials Availability Survey
1.3 Online help service
最佳答案
要获取doc文件的文本,您需要使用XWFParagraph(使用poi-ooxml API)方法。要获取该段落的编号,请尝试以下代码:
BigInteger currentParagraphNumberingID = currentPara_Line.getCTP().getPPr().getNumPr().getNumId().getVal();
BigInteger currentParagraphAbstractNumID2 = currentPara_Line.getDocument().getNumbering().getAbstractNumID(currentParagraphNumberingID);
XWPFAbstractNum currentParagraphAbstractNum = currentPara_Line.getDocument().getNumbering().getAbstractNum(currentParagraphAbstractNumID2);
CTAbstractNum currentParagraphAbstractNumFormatting = currentParagraphAbstractNum.getCTAbstractNum();
关于java - 如何从 .docx 文件中提取编号和文本,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/38067470/