我们使用 iText+XHTMLRenderer 将大型 HTML 文件转换为 PDF。今天,它成功地霸占了我们开发环境中的所有资源并使其无法使用:
This is a jstack dump:
02aaabc585000 nid=0x3af7 runnable [0x00002aaaf0269000]
java.lang.Thread.State: RUNNABLE
at java.awt.geom.Path2D$Double.rectCrossings(Path2D.java:1288)
at java.awt.geom.Path2D.intersects(Path2D.java:2290)
at java.awt.geom.Path2D.intersects(Path2D.java:2314)
at org.xhtmlrenderer.layout.BoxCollector.intersectsAggregateBounds(BoxCollector.java:90)
at org.xhtmlrenderer.layout.BoxCollector.collect(BoxCollector.java:121)
at org.xhtmlrenderer.layout.BoxCollector.collect(BoxCollector.java:139)
at org.xhtmlrenderer.layout.BoxCollector.collect(BoxCollector.java:139)
at org.xhtmlrenderer.layout.BoxCollector.collect(BoxCollector.java:139)
at org.xhtmlrenderer.layout.BoxCollector.collect(BoxCollector.java:139)
at org.xhtmlrenderer.layout.BoxCollector.collect(BoxCollector.java:139)
at org.xhtmlrenderer.layout.BoxCollector.collect(BoxCollector.java:139)
at org.xhtmlrenderer.layout.BoxCollector.collect(BoxCollector.java:139)
at org.xhtmlrenderer.layout.BoxCollector.collect(BoxCollector.java:139)
at org.xhtmlrenderer.layout.BoxCollector.collect(BoxCollector.java:139)
at org.xhtmlrenderer.layout.BoxCollector.collect(BoxCollector.java:139)
at org.xhtmlrenderer.layout.BoxCollector.collect(BoxCollector.java:139)
at org.xhtmlrenderer.layout.BoxCollector.collect(BoxCollector.java:139)
at org.xhtmlrenderer.layout.BoxCollector.collect(BoxCollector.java:139)
at org.xhtmlrenderer.layout.BoxCollector.collect(BoxCollector.java:139)
at org.xhtmlrenderer.layout.BoxCollector.collect(BoxCollector.java:139)
at org.xhtmlrenderer.layout.BoxCollector.collect(BoxCollector.java:139)
at org.xhtmlrenderer.layout.BoxCollector.collect(BoxCollector.java:139)
at org.xhtmlrenderer.layout.BoxCollector.collect(BoxCollector.java:139)
at org.xhtmlrenderer.layout.BoxCollector.collect(BoxCollector.java:139)
at org.xhtmlrenderer.layout.BoxCollector.collect(BoxCollector.java:139)
at org.xhtmlrenderer.layout.BoxCollector.collect(BoxCollector.java:139)
at org.xhtmlrenderer.layout.BoxCollector.collect(BoxCollector.java:139)
at org.xhtmlrenderer.layout.BoxCollector.collect(BoxCollector.java:46)
at org.xhtmlrenderer.layout.Layer.paint(Layer.java:314)
at org.xhtmlrenderer.pdf.ITextRenderer.paintPage(ITextRenderer.java:384)
at org.xhtmlrenderer.pdf.ITextRenderer.writePDF(ITextRenderer.java:348)
at org.xhtmlrenderer.pdf.ITextRenderer.createPDF(ITextRenderer.java:315)
at org.xhtmlrenderer.pdf.ITextRenderer.createPDF(ITextRenderer.java:246)
这是一个历史转储:
num instances bytes class name
1: 1344539 776639912 [B
2: 1798853 301253344 [C
3: 535059 72768024 org.xhtmlrenderer.render.InlineLayoutBox
4: 762761 52412032 [Ljava.lang.Object;
5: 1519522 48624704 java.lang.String
6: 1149491 45979640 com.someco.p.d
7: 203533 38674984 [I
8: 216490 31313568 <constMethodKlass>
9: 216490 29455216 <methodKlass>
10: 387065 24772160 org.xhtmlrenderer.render.InlineBox
11: 23732 23915216 <constantPoolKlass>
12: 727350 23275200 java.awt.Rectangle
13: 243878 23095936 [Ljava.util.HashMap$Entry;
14: 147045 22350840 org.xhtmlrenderer.render.LineBox
15: 667914 21373248 java.util.HashMap$Entry
16: 855194 20524656 java.util.concurrent.LinkedBlockingQueue$Node
17: 23732 18543256 <instanceKlassKlass>
18: 537890 17212480 org.xhtmlrenderer.css.style.derived.RectPropertySet
19: 688836 16532064 org.xhtmlrenderer.layout.PaintingInfo
20: 688836 16532064 java.awt.Dimension
21: 264061 15254448 <symbolKlass>
22: 268028 15009568 org.xhtmlrenderer.render.InlineText
显然,这是由于正在转换一个(非常非常)大的 HTML 文件而发生的。它让我们思考——什么更好——阻止大型 HTML 文件被转换,或者找到一种更有效的方法将 HTML 转换为 PDF 而不先将 HTML 渲染到“屏幕”(即XHTMLRednerer 基本上是做什么的)。
谷歌搜索和网络阅读并没有带来好的选择。它们都是由一些我们不知道可以信任的粗略公司提供的。有人有替代方案吗?
最佳答案
http://sourceforge.net/projects/xmlworker/
XMLWorker 过去作为 HTMLWorker 与 iText 一起提供。 它可能不适用于过于复杂的 html,但请尝试一下。
当前的解决方案当然是结合使用 pdfHTML(iText7 插件)和 iText7。
关于java - 比 XHTMLRenderer+iText 更有效的将 HTML 转换为 PDF 的方法,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/9686882/