azure - 使用 Spark 订阅事件中心时出现异常

标签 azure apache-spark events azure-eventhub

    INFO ConnectionHandler: onConnectionLocalClose: hostname[edgeeventhub.servicebus.windows.net:5671], errorCondition[null, null]
    19/01/09 02:34:00 INFO ConnectionHandler: onConnectionUnbound: hostname[edgeeventhub.servicebus.windows.net:5671], state[CLOSED], remoteState[ACTIVE]
    19/01/09 02:34:00 INFO SessionHandler: entityName[mgmt-session], condition[Error{condition=null, description='null', info=null}]
    19/01/09 02:34:00 INFO SessionHandler: entityName[cbs-session], condition[Error{condition=null, description='null', info=null}]
    19/01/09 02:34:00 INFO SessionHandler: entityName[mgmt-session]
    19/01/09 02:34:00 INFO SessionHandler: entityName[cbs-session]
    19/01/09 02:34:00 INFO SessionHandler: entityName[edgeeventhub/ConsumerGroups/$Default/Partitions/17]
    19/01/09 02:34:00 INFO SessionHandler: entityName[edgeeventhub/ConsumerGroups/$Default/Partitions/28]
    19/01/09 02:34:00 INFO SessionHandler: entityName[edgeeventhub/ConsumerGroups/$Default/Partitions/19]
    19/01/09 02:34:00 INFO SessionHandler: entityName[edgeeventhub/ConsumerGroups/$Default/Partitions/29]
    19/01/09 02:34:00 INFO SessionHandler: entityName[edgeeventhub/ConsumerGroups/$Default/Partitions/1]
    19/01/09 02:34:00 INFO SessionHandler: entityName[edgeeventhub/ConsumerGroups/$Default/Partitions/8]
    19/01/09 02:34:00 INFO SessionHandler: entityName[edgeeventhub/ConsumerGroups/$Default/Partitions/16]
    19/01/09 02:34:00 INFO SessionHandler: entityName[edgeeventhub/ConsumerGroups/$Default/Partitions/9]
    19/01/09 02:34:00 INFO SessionHandler: entityName[edgeeventhub/ConsumerGroups/$Default/Partitions/14]
    19/01/09 02:34:00 INFO SessionHandler: entityName[edgeeventhub/ConsumerGroups/$Default/Partitions/24]
    19/01/09 02:34:00 INFO SessionHandler: entityName[edgeeventhub/ConsumerGroups/$Default/Partitions/25]
    19/01/09 02:34:00 INFO SessionHandler: entityName[edgeeventhub/ConsumerGroups/$Default/Partitions/15]
    19/01/09 02:34:00 INFO SessionHandler: entityName[edgeeventhub/ConsumerGroups/$Default/Partitions/26]
    19/01/09 02:34:00 INFO SessionHandler: entityName[edgeeventhub/ConsumerGroups/$Default/Partitions/18]
    19/01/09 02:34:00 INFO SessionHandler: entityName[edgeeventhub/ConsumerGroups/$Default/Partitions/27]
    19/01/09 02:34:00 INFO SessionHandler: entityName[edgeeventhub/ConsumerGroups/$Default/Partitions/4]
    19/01/09 02:34:00 INFO SessionHandler: entityName[edgeeventhub/ConsumerGroups/$Default/Partitions/5]
    19/01/09 02:34:00 INFO SessionHandler: entityName[edgeeventhub/ConsumerGroups/$Default/Partitions/7]
    19/01/09 02:34:00 INFO SessionHandler: entityName[edgeeventhub/ConsumerGroups/$Default/Partitions/3]
    19/01/09 02:34:00 INFO SessionHandler: entityName[edgeeventhub/ConsumerGroups/$Default/Partitions/6]
    19/01/09 02:34:00 INFO SessionHandler: entityName[edgeeventhub/ConsumerGroups/$Default/Partitions/22]
    19/01/09 02:34:00 INFO SessionHandler: entityName[edgeeventhub/ConsumerGroups/$Default/Partitions/0]
    19/01/09 02:34:00 INFO SessionHandler: entityName[edgeeventhub/ConsumerGroups/$Default/Partitions/21]
    19/01/09 02:34:00 INFO SessionHandler: entityName[edgeeventhub/ConsumerGroups/$Default/Partitions/20]
    19/01/09 02:34:00 INFO SessionHandler: entityName[edgeeventhub/ConsumerGroups/$Default/Partitions/23]
    19/01/09 02:34:00 INFO SessionHandler: entityName[edgeeventhub/ConsumerGroups/$Default/Partitions/2]
    19/01/09 02:34:00 INFO SessionHandler: entityName[edgeeventhub/ConsumerGroups/$Default/Partitions/10]
    19/01/09 02:34:00 INFO SessionHandler: entityName[edgeeventhub/ConsumerGroups/$Default/Partitions/13]
    19/01/09 02:34:00 INFO SessionHandler: entityName[edgeeventhub/ConsumerGroups/$Default/Partitions/31]
    19/01/09 02:34:00 INFO SessionHandler: entityName[edgeeventhub/ConsumerGroups/$Default/Partitions/12]
    19/01/09 02:34:00 INFO SessionHandler: entityName[edgeeventhub/ConsumerGroups/$Default/Partitions/11]
    19/01/09 02:34:00 INFO SessionHandler: entityName[edgeeventhub/ConsumerGroups/$Default/Partitions/30]
    19/01/09 02:34:00 ERROR StreamExecution: Query [id = e5b72043-d85a-4004-9f1c-dc3aaa77a0bc, runId = 75261962-7648-4d7d-90e9-b2ed0906d2b7] terminated with error
    java.util.concurrent.ExecutionException: com.microsoft.azure.eventhubs.EventHubException: connection aborted
            at java.util.concurrent.CompletableFuture.reportGet(CompletableFuture.java:357)
            at java.util.concurrent.CompletableFuture.get(CompletableFuture.java:1895)
            at org.apache.spark.eventhubs.client.EventHubsClient.getRunTimeInfo(EventHubsClient.scala:112)
            at org.apache.spark.eventhubs.client.EventHubsClient.boundedSeqNos(EventHubsClient.scala:149)
            at org.apache.spark.sql.eventhubs.EventHubsSource$$anonfun$6.apply(EventHubsSource.scala:130)
            at org.apache.spark.sql.eventhubs.EventHubsSource$$anonfun$6.apply(EventHubsSource.scala:128)
            at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234)
            at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234)
            at scala.collection.Iterator$class.foreach(Iterator.scala:893)
            at scala.collection.AbstractIterator.foreach(Iterator.scala:1336)
            at scala.collection.IterableLike$class.foreach(IterableLike.scala:72)
            at scala.collection.AbstractIterable.foreach(Iterable.scala:54)
            at scala.collection.TraversableLike$class.map(TraversableLike.scala:234)
            at scala.collection.AbstractTraversable.map(Traversable.scala:104)
            at org.apache.spark.sql.eventhubs.EventHubsSource.getOffset(EventHubsSource.scala:128)
            at org.apache.spark.sql.execution.streaming.StreamExecution$$anonfun$10$$anonfun$apply$6.apply(StreamExecution.scala:521)
            at org.apache.spark.sql.execution.streaming.StreamExecution$$anonfun$10$$anonfun$apply$6.apply(StreamExecution.scala:521)
            at org.apache.spark.sql.execution.streaming.ProgressReporter$class.reportTimeTaken(ProgressReporter.scala:279)
            at org.apache.spark.sql.execution.streaming.StreamExecution.reportTimeTaken(StreamExecution.scala:58)
            at org.apache.spark.sql.execution.streaming.StreamExecution$$anonfun$10.apply(StreamExecution.scala:520)
            at org.apache.spark.sql.execution.streaming.StreamExecution$$anonfun$10.apply(StreamExecution.scala:518)
            at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234)
            at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234)
            at scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59)
            at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:48)
            at scala.collection.TraversableLike$class.map(TraversableLike.scala:234)
            at scala.collection.AbstractTraversable.map(Traversable.scala:104)
            at org.apache.spark.sql.execution.streaming.StreamExecution.org$apache$spark$sql$execution$streaming$StreamExecution$$constructNextBatch(StreamExecution.scala:518)
            at org.apache.spark.sql.execution.streaming.StreamExecution$$anonfun$org$apache$spark$sql$execution$streaming$StreamExecution$$runBatches$1$$anonfun$apply$mcZ$sp$1.apply$mcV$sp(StreamExecution.scala:301)
            at org.apache.spark.sql.execution.streaming.StreamExecution$$anonfun$org$apache$spark$sql$execution$streaming$StreamExecution$$runBatches$1$$anonfun$apply$mcZ$sp$1.apply(StreamExecution.scala:294)
            at org.apache.spark.sql.execution.streaming.StreamExecution$$anonfun$org$apache$spark$sql$execution$streaming$StreamExecution$$runBatches$1$$anonfun$apply$mcZ$sp$1.apply(StreamExecution.scala:294)
            at org.apache.spark.sql.execution.streaming.ProgressReporter$class.reportTimeTaken(ProgressReporter.scala:279)
            at org.apache.spark.sql.execution.streaming.StreamExecution.reportTimeTaken(StreamExecution.scala:58)
            at org.apache.spark.sql.execution.streaming.StreamExecution$$anonfun$org$apache$spark$sql$execution$streaming$StreamExecution$$runBatches$1.apply$mcZ$sp(StreamExecution.scala:294)
            at org.apache.spark.sql.execution.streaming.ProcessingTimeExecutor.execute(TriggerExecutor.scala:56)
            at org.apache.spark.sql.execution.streaming.StreamExecution.org$apache$spark$sql$execution$streaming$StreamExecution$$runBatches(StreamExecution.scala:290)
            at org.apache.spark.sql.execution.streaming.StreamExecution$$anon$1.run(StreamExecution.scala:206)
    Caused by: com.microsoft.azure.eventhubs.EventHubException: connection aborted
            at com.microsoft.azure.eventhubs.impl.ExceptionUtil.toException(ExceptionUtil.java:58)
            at com.microsoft.azure.eventhubs.impl.RequestResponseChannel$ResponseHandler.onClose(RequestResponseChannel.java:250)
            at com.microsoft.azure.eventhubs.impl.BaseLinkHandler.processOnClose(BaseLinkHandler.java:50)
            at com.microsoft.azure.eventhubs.impl.MessagingFactory.onConnectionError(MessagingFactory.java:266)
            at com.microsoft.azure.eventhubs.impl.ConnectionHandler.onTransportError(ConnectionHandler.java:105)
            at org.apache.qpid.proton.engine.BaseHandler.handle(BaseHandler.java:191)
            at org.apache.qpid.proton.engine.impl.EventImpl.dispatch(EventImpl.java:108)
            at org.apache.qpid.proton.reactor.impl.ReactorImpl.dispatch(ReactorImpl.java:324)
            at org.apache.qpid.proton.reactor.impl.ReactorImpl.process(ReactorImpl.java:291)
            at com.microsoft.azure.eventhubs.impl.MessagingFactory$RunReactor.run(MessagingFactory.java:462)
            at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
            at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
            at java.lang.Thread.run(Thread.java:748)
    19/01/09 02:34:00 INFO EventHubsClient: close: Closing EventHubsClient.
    19/01/09 02:34:00 INFO ClientConnectionPool: Client returned. EventHub name: edgeeventhub. Total clients: 3. Available clients: 3
    Exception in thread "main" org.apache.spark.sql.streaming.StreamingQueryException: com.microsoft.azure.eventhubs.EventHubException: connection aborted
    === Streaming Query ===
    Identifier: [id = e5b72043-d85a-4004-9f1c-dc3aaa77a0bc, runId = 75261962-7648-4d7d-90e9-b2ed0906d2b7]
    Current Committed Offsets: {org.apache.spark.sql.eventhubs.EventHubsSource@e8217e7: {"edgeeventhub":{"23":16394,"8":16404,"17":16406,"26":16396,"11":16403,"29":16400,"2":16408,"20":16400,"5":16401,"14":16401,"4":16405,"13":16404,"31":16400,"22":16394,"7":16399,"16":16406,"25":16400,"10":16405,"1":16402,"28":16411,"19":16405,"27":16406,"9":16400,"18":16408,"12":16401,"3":16403,"21":16397,"30":16398,"15":16407,"6":16405,"24":16391,"0":16406}}}
    Current Available Offsets: {org.apache.spark.sql.eventhubs.EventHubsSource@e8217e7: {"edgeeventhub":{"23":16394,"8":16404,"17":16406,"26":16396,"11":16403,"29":16400,"2":16408,"20":16400,"5":16401,"14":16401,"4":16405,"13":16404,"31":16400,"22":16394,"7":16399,"16":16406,"25":16400,"10":16405,"1":16402,"28":16411,"19":16405,"27":16406,"9":16400,"18":16408,"12":16401,"3":16403,"21":16397,"30":16398,"15":16407,"6":16405,"24":16391,"0":16406}}}

    Current State: ACTIVE
    Thread State: RUNNABLE

我在长时间 Spark 运行时遇到异常,它从事件中心获取数据并同时保存到 Redis 中,之后我有调度程序来提取整个数据并保存到数据库中。 这个过程完美地持续了 8-10 小时,之后我发现了上述问题,但是我正在使用以下 sdk。 请提出您的建议,因为我在生产环境中面临上述问题。azure-eventhubs-spark_2.11 版本 2.2.1

    var maxEventTrigger: Long = Constants.maxEventTrigger.toLong;
        val customEventhubParameters = EventHubsConf(connStr).setMaxEventsPerTrigger(maxEventTrigger);
        val incomingStream = spark.readStream.format("eventhubs").options(customEventhubParameters.toMap).load();
        logger.info("Data has been fetched from event hub successfully");
        val messages = incomingStream.withColumn("Offset", $"offset".cast(LongType)).withColumn("Time (readable)", $"enqueuedTime".cast(TimestampType)).withColumn("Timestamp", $"enqueuedTime".cast(LongType)).withColumn("Body", $"body".cast(StringType)).select("Offset", "Time (readable)", "Timestamp", "Body")

        implicit val formats = DefaultFormats;

        val ob = new EventhubMaster();
        ob.execute(messages);

最佳答案

在 eventhub 中为每个 Spark 作业创建多个组解决了该问题。

关于azure - 使用 Spark 订阅事件中心时出现异常,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/54103321/

相关文章:

java - JTable onchange 事件

javascript - MooTools:如何使对象捕获其嵌套对象中发生的自定义事件?

azure - Redis 缓存中的 key 不可用

azure - Terraform - 如何在 AKS 集群上安装 AZURE CNI 并设置 POD IP 范围?

apache-spark - Spark 1.6.1 S3 MultiObjectDeleteException

apache-spark - Spark Structured Streaming 如何处理背压?

javascript - 动态创建的元素上的事件绑定(bind)?

azure - 仅允许某些用户使用具有 Azure AD B2C 授权的应用程序进行注册

azure - 在 ARM 模板中隐藏新的 vNet 选项

python - Spark KMeans 无法处理大数据吗?