java - 如何从 .docx 文件中提取编号和文本

标签 java apache-poi extract docx

如何使用 Java 和 Apache POI XWPF 库从 .docx 文件中提取编号和文本?

我正在使用以下代码:

public static void readDocxFile() {

    try {
        File file = new File("C:\\test.docx");
        FileInputStream fis = new FileInputStream(file.getAbsolutePath());
        XWPFDocument document = new XWPFDocument(fis);
        List<XWPFParagraph> paragraphs = document.getParagraphs();

        for (XWPFParagraph para : paragraphs) {
            System.out.println(para.getText());

            fis.close();
        }
    } catch (Exception e) {
        e.printStackTrace();
    }
}

我的代码只是提取文本,如下所示:

CLIENT SERVICE SATISFACTION
Client Feedback System
Interlibrary Loans
Shelf Tidiness
Three Day Loans
Materials Availability Survey
Online help service

我需要提取文本中的章节编号(编号),如下所示:

1    CLIENT SERVICE SATISFACTION
1.1   Client Feedback System
1.1.1 Interlibrary Loans
1.1.2 Shelf Tidiness
1.1.3 Three Day Loans
1.2   Materials Availability Survey
1.3   Online help service

最佳答案

要获取doc文件的文本,您需要使用XWFParagraph(使用poi-ooxml API)方法。要获取该段落的编号,请尝试以下代码:

BigInteger currentParagraphNumberingID = currentPara_Line.getCTP().getPPr().getNumPr().getNumId().getVal(); 
BigInteger currentParagraphAbstractNumID2 = currentPara_Line.getDocument().getNumbering().getAbstractNumID(currentParagraphNumberingID);
XWPFAbstractNum currentParagraphAbstractNum = currentPara_Line.getDocument().getNumbering().getAbstractNum(currentParagraphAbstractNumID2); 
CTAbstractNum currentParagraphAbstractNumFormatting = currentParagraphAbstractNum.getCTAbstractNum();                                

关于java - 如何从 .docx 文件中提取编号和文本,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/38067470/

相关文章:

pyspark - 在数据 block 上提取 tar.gz

java - 返回空数组

java - IVY/JAR 错误 - java.lang.NoClassDefFoundError : org/apache/poi/ss/usermodel/Row

java - 如何获取XSSFSimpleShape对象的背景颜色?

mysql - SAS XML 映射 - 具有多个 XML 文件

c# - 在文本 block ( block 元素)末尾截断 HTML 内容

java - Swing组件的ActionMap中默认 Action 的名称是否标准化?

java - Flutter+Android Studio : Can't resolve Java symbols from external libraries

java - 我已经安装了 Berkeley DB 5.1.25.msi Windows 安装程序

java - Apache POI HSSF XLS 读取错误