我们正在使用 Tika 1.1 从 XLSM 文件中提取内容。我们有两个服务器实例。在其中一台服务器上,文件内容已正确提取。但在另一台服务器上,我收到同一文件的 zip 炸弹异常。我们在两个实例中使用相同的 tika 独立 jar。但我无法确定问题所在。
不确定 SAX 配置是否在运行时产生问题(我不太熟悉 SAX)。如何调试这个问题?
Caused by: org.apache.tika.exception.TikaException: Zip bomb detected! at org.apache.tika.sax.SecureContentHandler.throwIfCauseOf(SecureContentHandler.java:192) at org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:123) at org.apache.tika.Tika.parseToString(Tika.java:380) at com.ptc.search.solr.contentReader.contentExtraction.TikaExtractor.getContent(TikaExtractor.java:36) ... 45 more Caused by: org.apache.tika.sax.SecureContentHandler$SecureSAXException: Suspected zip bomb: 878 levels of XML element nesting at org.apache.tika.sax.SecureContentHandler.startElement(SecureContentHandler.java:234) at org.apache.tika.sax.ContentHandlerDecorator.startElement(ContentHandlerDecorator.java:126) at org.apache.tika.sax.ContentHandlerDecorator.startElement(ContentHandlerDecorator.java:126) at org.apache.tika.sax.ContentHandlerDecorator.startElement(ContentHandlerDecorator.java:126) at org.apache.tika.sax.ContentHandlerDecorator.startElement(ContentHandlerDecorator.java:126) at org.apache.tika.sax.SafeContentHandler.startElement(SafeContentHandler.java:264) at org.apache.tika.sax.XHTMLContentHandler.startElement(XHTMLContentHandler.java:244) at org.apache.tika.sax.XHTMLContentHandler.startElement(XHTMLContentHandler.java:274) at org.apache.tika.sax.XHTMLContentHandler.element(XHTMLContentHandler.java:313) at org.apache.tika.parser.microsoft.ooxml.XSSFExcelExtractorDecorator.extractHeaderFooter(XSSFExcelExtractorDecorator.java:145) at org.apache.tika.parser.microsoft.ooxml.XSSFExcelExtractorDecorator.buildXHTML(XSSFExcelExtractorDecorator.java:129) at org.apache.tika.parser.microsoft.ooxml.AbstractOOXMLExtractor.getXHTML(AbstractOOXMLExtractor.java:104) at org.apache.tika.parser.microsoft.ooxml.OOXMLExtractorFactory.parse(OOXMLExtractorFactory.java:110) at org.apache.tika.parser.microsoft.ooxml.OOXMLParser.parse(OOXMLParser.java:82) at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:242) at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:242) at org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:120) ... 47 more
最佳答案
调试 tika 代码后,我意识到我在 WriteOutContentHandler 上设置了 maxStringLength,并且在达到限制后代码抛出了 zip 炸弹错误。正确的错误消息可能会更快有所帮助。不管怎样,感谢大家的意见。我们肯定会计划迁移到最新版本。
我们应该在 Jira 中创建一个缺陷来抛出正确的错误消息吗?
关于java - 蒂卡 zipper 炸弹异常(exception),我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/21882405/