java - 在我的 Storm 集群中读取 AWS SQS 队列时导致这些 ParseError 异常的原因是什么

标签 java amazon-web-services amazon-sqs apache-storm jaxp

我正在使用 Storm 0.8.1 从 Amazon SQS 队列中读取传入消息,并且在这样做时得到一致的异常:

2013-12-02 02:21:38 executor [ERROR] 
java.lang.RuntimeException: com.amazonaws.AmazonClientException: Unable to unmarshall response (ParseError at [row,col]:[1,1]
Message: JAXP00010001: The parser has encountered more than "64000" entity expansions in this document; this is the limit imposed by the JDK.)
        at REDACTED.spouts.SqsQueueSpout.handleNextTuple(SqsQueueSpout.java:219)
        at REDACTED.spouts.SqsQueueSpout.nextTuple(SqsQueueSpout.java:88)
        at backtype.storm.daemon.executor$fn__3976$fn__4017$fn__4018.invoke(executor.clj:447)
        at backtype.storm.util$async_loop$fn__465.invoke(util.clj:377)
        at clojure.lang.AFn.run(AFn.java:24)
        at java.lang.Thread.run(Thread.java:701)
Caused by: com.amazonaws.AmazonClientException: Unable to unmarshall response (ParseError at [row,col]:[1,1]
Message: JAXP00010001: The parser has encountered more than "64000" entity expansions in this document; this is the limit imposed by the JDK.)
        at com.amazonaws.http.AmazonHttpClient.handleResponse(AmazonHttpClient.java:524)
        at com.amazonaws.http.AmazonHttpClient.executeHelper(AmazonHttpClient.java:298)
        at com.amazonaws.http.AmazonHttpClient.execute(AmazonHttpClient.java:167)
        at com.amazonaws.services.sqs.AmazonSQSClient.invoke(AmazonSQSClient.java:812)
        at com.amazonaws.services.sqs.AmazonSQSClient.receiveMessage(AmazonSQSClient.java:575)
        at REDACTED.spouts.SqsQueueSpout.handleNextTuple(SqsQueueSpout.java:191)
        ... 5 more
Caused by: javax.xml.stream.XMLStreamException: ParseError at [row,col]:[1,1]
Message: JAXP00010001: The parser has encountered more than "64000" entity expansions in this document; this is the limit imposed by the JDK.
        at com.sun.org.apache.xerces.internal.impl.XMLStreamReaderImpl.setInputSource(XMLStreamReaderImpl.java:219)
        at com.sun.org.apache.xerces.internal.impl.XMLStreamReaderImpl.<init>(XMLStreamReaderImpl.java:189)
        at com.sun.xml.internal.stream.XMLInputFactoryImpl.getXMLStreamReaderImpl(XMLInputFactoryImpl.java:277)
        at com.sun.xml.internal.stream.XMLInputFactoryImpl.createXMLStreamReader(XMLInputFactoryImpl.java:129)
        at com.sun.xml.internal.stream.XMLInputFactoryImpl.createXMLEventReader(XMLInputFactoryImpl.java:78)
        at com.amazonaws.http.StaxResponseHandler.handle(StaxResponseHandler.java:85)
        at com.amazonaws.http.StaxResponseHandler.handle(StaxResponseHandler.java:41)
        at com.amazonaws.http.AmazonHttpClient.handleResponse(AmazonHttpClient.java:503)
        ... 10 more

我已经调试了队列中的数据,一切看起来都不错。我不明白为什么 API 的 XML 响应会导致这些问题。有什么想法吗?

最佳答案

多年来一直在这里回答我自己的问题。

目前在 Oracle 和 OpenJDK 的 Java 中存在一个 XML 扩展限制处理错误,导致在解析多个 XML 文档时共享计数器达到默认上限。

  1. https://blogs.oracle.com/joew/entry/jdk_7u45_aws_issue_123
  2. https://bugs.openjdk.java.net/browse/JDK-8028111
  3. https://github.com/aws/aws-sdk-java/issues/123

虽然我认为我们的版本 (6b27-1.12.6-1ubuntu0.12.04.4) 没有受到影响,但运行 OpenJDK 错误报告中给出的示例代码确实验证了我们容易受到该错误的影响。

要解决此问题,我需要将 jdk.xml.entityExpansionLimit=0 传递给 Storm 工作人员。通过将以下内容添加到我的集群中的 storm.yaml 中,我能够缓解这个问题。

supervisor.childopts: "-Djdk.xml.entityExpansionLimit=0"
worker.childopts: "-Djdk.xml.entityExpansionLimit=0"

我应该指出,从技术上讲,这会使您面临拒绝服务攻击,但由于我们的 XML 文档仅来自 SQS,所以我不担心有人伪造恶意 XML 来杀死我们的 worker 。

关于java - 在我的 Storm 集群中读取 AWS SQS 队列时导致这些 ParseError 异常的原因是什么,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/20482331/

相关文章:

java - 使用参数化类型实现 guice Provider

java - 如何在 Kotlin 中使用泛型参数创建列表

json - 我收到 "Invalid template property or properties"模板验证错误 : Invalid template property or properties [IPAssoc, IPAddress]”

java - 如何导入包中的jar文件?

java - 像bean方法一样配置brokerURL

amazon-web-services - 带有 Amazon ECR Docker 镜像的 AWS Elastic Beanstalk

ruby-on-rails - 如何从aws rds导出postgres数据库

python - 通过 boto 传入的 Amazon SQS 消息出现乱码

amazon-sqs - SNS 主题到 SQS 队列的 KMS 访问被拒绝异常

java - 使用 Java sdk 删除 AWS SQS 消息