java - 使用 java Apache PDFBOX 添加 HTML 标记

我一直在使用 PDFBOX 和 EasyTable，它扩展了 PDFBOX 来绘制数据表。我遇到了一个问题，我有一个带有 HTML 数据字符串的 java 对象，我需要使用 PDFBOX 将其添加到 PDF 中。对文档的挖掘似乎没有产生任何成果。

下面的代码是一个 hello world 片段，我希望在 pdf 中生成 H1 格式。

// Create a document and add a page to it
        PDDocument document = new PDDocument();
        PDPage page = new PDPage();
        document.addPage( page );

// Create a new font object selecting one of the PDF base fonts
        PDFont font = PDType1Font.HELVETICA_BOLD;

// Start a new content stream which will "hold" the to be created content
        PDPageContentStream contentStream = new PDPageContentStream(document, page);

// Define a text content stream using the selected font, moving the cursor and drawing the text "Hello World"
        contentStream.beginText();
        contentStream.setFont( font, 12 );
        contentStream.moveTextPositionByAmount( 100, 700 );
        contentStream.drawString( "<h1>HelloWorld</h1>" );
        contentStream.endText();

// Make sure that the content stream is closed:
        contentStream.close();

// Save the results and ensure that the document is properly closed:
        document.save( "Hello World.pdf");
        document.close();

    }

最佳答案

使用jerico将 html 格式化为自由文本，同时正确映射标签的输出。

样本

public String extractAllText(String htmlText){
    return new net.htmlparser.jericho
            .Source(htmlText)
            .getRenderer()
            .setMaxLineLength(Integer.MAX_VALUE)
            .setNewLine(null)
            .toString();
}

在你的 gradle 或 maven 中包含:

compile group: 'net.htmlparser.jericho', name: 'jericho-html', version: '3.4'

关于java - 使用 java Apache PDFBOX 添加 HTML 标记，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/58221833/

java - 使用 java Apache PDFBOX 添加 HTML 标记

上一篇：java - 查找给定 Windows 版本的最新版本的方法

下一篇：java - 长时间运行的 activiti 服务任务