apache-spark - 每个直接流创建了多少消费者来读取记录？

标签 apache-spark apache-kafka spark-streaming

我正在使用 Spark Streaming 从 Kafka 读取数据(使用 Kafka direct stream API )。

流中实例化了多少个 Kafka 消费者？ Kafka消费者的数量是否等于执行者的数量？每个执行器是否实例化一个 Kafka 消费者(具有相同的组 ID)？

最佳答案

直接进场消费者数量will be exactly the same as the number of Kafka Partitions :

The Spark Streaming integration for Kafka 0.10 is similar in design to the 0.8 Direct Stream approach. It provides simple parallelism, 1:1 correspondence between Kafka partitions and Spark partitions, and access to offsets and metadata

和单独的消费者is initialized for each partition .

关于apache-spark - 每个直接流创建了多少消费者来读取记录？，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/44686366/

上一篇：sql - to_json 不能在 spark 中使用 selectExpr

下一篇：vue.js - 带有 v-for 的 Vue 内联模板 - 未定义

相关文章：

python - 访问 PySpark 中的计数列

python - 如何使用 pyspark 在 Spark 2.0 中构建 sparkSession？

jms - ActiveMQ、Apollo、Kafka

java - Spark Streaming Word Count 错误/语法

apache-spark - 在 Spark Streaming 中读取 Hbase 数据

python - 将新行添加到 pyspark Dataframe

python - 使用 Pyspark 和 Hive 显示来自特定数据库的表

kotlin - 注册 Avro 架构 : "string" RestClientException: Schema being registered is incompatible with an earlier schema; 时出错

apache-kafka - 斯卡拉 : Cannot resolve overloaded methods (Flink WatermarkStrategy)

scala - Spark Streaming Kafka 中的 DStream 过滤和偏移管理