scala - 具有 SASL_SSL 身份验证的 Kafka Spark 结构化流

标签 scala apache-spark apache-kafka spark-structured-streaming spark-kafka-integration

我一直在尝试使用 Spark Structured Streaming API 通过 SASL_SSL 连接到 Kafka 集群。我已将 jaas.conf 文件传递​​给执行者。看来我无法设置 keystore 和信任库身份验证的值。

我尝试传递 thisspark link 中提到的值

另外,尝试将它传递给 link 中的代码

还是没有运气。

这是日志

20/02/28 10:00:53 INFO streaming.StreamExecution: Starting [id = e176f5e7-7157-4df5-93ce-1e267bae6125, runId = 03225a69-ec00-45d9-8092-1467da34980f]. Use flight/checkpoint to store the query checkpoint.
20/02/28 10:00:53 INFO yarn.ApplicationMaster: Final app status: SUCCEEDED, exitCode: 0
20/02/28 10:00:53 INFO spark.SparkContext: Invoking stop() from shutdown hook
20/02/28 10:00:53 INFO server.AbstractConnector: Stopped Spark@46202f7b{HTTP/1.1,[http/1.1]}{0.0.0.0:0}
20/02/28 10:00:53 INFO consumer.ConsumerConfig: ConsumerConfig values:
        metric.reporters = []
        metadata.max.age.ms = 300000
        partition.assignment.strategy = [org.apache.kafka.clients.consumer.RangeAssignor]
        reconnect.backoff.ms = 50
        sasl.kerberos.ticket.renew.window.factor = 0.8
        max.partition.fetch.bytes = 1048576
        bootstrap.servers = [broker1:9093, broker2:9093]
        ssl.keystore.type = JKS
        enable.auto.commit = false
        sasl.mechanism = GSSAPI
        interceptor.classes = null
        exclude.internal.topics = true
        ssl.truststore.password = null
        client.id =
        ssl.endpoint.identification.algorithm = null
        max.poll.records = 1
        check.crcs = true
        request.timeout.ms = 40000
        heartbeat.interval.ms = 3000
        auto.commit.interval.ms = 5000
        receive.buffer.bytes = 65536
        ssl.truststore.type = JKS
        ssl.truststore.location = null
        ssl.keystore.password = null
        fetch.min.bytes = 1
        send.buffer.bytes = 131072
        value.deserializer = class org.apache.kafka.common.serialization.ByteArrayDeserializer
        group.id = spark-kafka-source-93d170e9-977c-40fc-9e5d-790d253fcff5-409016337-driver-0
        retry.backoff.ms = 100
        ssl.secure.random.implementation = null
        sasl.kerberos.kinit.cmd = /usr/bin/kinit
        sasl.kerberos.service.name = null
        sasl.kerberos.ticket.renew.jitter = 0.05
        ssl.trustmanager.algorithm = PKIX
        ssl.key.password = null
        fetch.max.wait.ms = 500
        sasl.kerberos.min.time.before.relogin = 60000
        connections.max.idle.ms = 540000
        session.timeout.ms = 30000
        metrics.num.samples = 2
        key.deserializer = class org.apache.kafka.common.serialization.ByteArrayDeserializer
        ssl.protocol = TLS
        ssl.provider = null
        ssl.enabled.protocols = [TLSv1.2, TLSv1.1, TLSv1]
        ssl.keystore.location = null
        ssl.cipher.suites = null
        security.protocol = SASL_SSL
        ssl.keymanager.algorithm = SunX509
        metrics.sample.window.ms = 30000
        auto.offset.reset = earliest

20/02/28 10:00:53 INFO ui.SparkUI: Stopped Spark web UI at http://<Server>:41037
20/02/28 10:00:53 ERROR streaming.StreamExecution: Query [id = e176f5e7-7157-4df5-93ce-1e267bae6125, runId = 03225a69-ec00-45d9-8092-1467da34980f] terminated with error
org.apache.kafka.common.KafkaException: Failed to construct kafka consumer
        at org.apache.kafka.clients.consumer.KafkaConsumer.<init>(KafkaConsumer.java:702)
        at org.apache.kafka.clients.consumer.KafkaConsumer.<init>(KafkaConsumer.java:557)
        at org.apache.kafka.clients.consumer.KafkaConsumer.<init>(KafkaConsumer.java:540)
        at org.apache.spark.sql.kafka010.SubscribeStrategy.createConsumer(ConsumerStrategy.scala:62)
        at org.apache.spark.sql.kafka010.KafkaOffsetReader.createConsumer(KafkaOffsetReader.scala:297)
        at org.apache.spark.sql.kafka010.KafkaOffsetReader.<init>(KafkaOffsetReader.scala:78)
        at org.apache.spark.sql.kafka010.KafkaSourceProvider.createSource(KafkaSourceProvider.scala:88)
        at org.apache.spark.sql.execution.datasources.DataSource.createSource(DataSource.scala:243)
        at org.apache.spark.sql.execution.streaming.StreamExecution$$anonfun$2$$anonfun$applyOrElse$1.apply(StreamExecution.scala:158)
        at org.apache.spark.sql.execution.streaming.StreamExecution$$anonfun$2$$anonfun$applyOrElse$1.apply(StreamExecution.scala:155)
        at scala.collection.mutable.MapLike$class.getOrElseUpdate(MapLike.scala:194)
        at scala.collection.mutable.AbstractMap.getOrElseUpdate(Map.scala:80)
        at org.apache.spark.sql.execution.streaming.StreamExecution$$anonfun$2.applyOrElse(StreamExecution.scala:155)
        at org.apache.spark.sql.execution.streaming.StreamExecution$$anonfun$2.applyOrElse(StreamExecution.scala:153)
        at org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$2.apply(TreeNode.scala:267)
        at org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$2.apply(TreeNode.scala:267)
        at org.apache.spark.sql.catalyst.trees.CurrentOrigin$.withOrigin(TreeNode.scala:70)
        at org.apache.spark.sql.catalyst.trees.TreeNode.transformDown(TreeNode.scala:266)
        at org.apache.spark.sql.catalyst.trees.TreeNode.transform(TreeNode.scala:256)
        at org.apache.spark.sql.execution.streaming.StreamExecution.logicalPlan$lzycompute(StreamExecution.scala:153)
        at org.apache.spark.sql.execution.streaming.StreamExecution.logicalPlan(StreamExecution.scala:147)
        at org.apache.spark.sql.execution.streaming.StreamExecution.org$apache$spark$sql$execution$streaming$StreamExecution$$runBatches(StreamExecution.scala:276)
        at org.apache.spark.sql.execution.streaming.StreamExecution$$anon$1.run(StreamExecution.scala:206)
Caused by: org.apache.kafka.common.KafkaException: javax.security.auth.login.LoginException: Could not login: the client is being asked for a password, but the Kafka client code does not currently support obtaining a password from the user. not available to garner  authentication information from the user
        at org.apache.kafka.common.network.SaslChannelBuilder.configure(SaslChannelBuilder.java:86)
        at org.apache.kafka.common.network.ChannelBuilders.create(ChannelBuilders.java:70)
        at org.apache.kafka.clients.ClientUtils.createChannelBuilder(ClientUtils.java:83)
        at org.apache.kafka.clients.consumer.KafkaConsumer.<init>(KafkaConsumer.java:623)
        ... 22 more
Caused by: javax.security.auth.login.LoginException: Could not login: the client is being asked for a password, but the Kafka client code does not currently support obtaining a password from the user. not available to garner  authentication information from the user
        at com.sun.security.auth.module.Krb5LoginModule.promptForPass(Unknown Source)
        at com.sun.security.auth.module.Krb5LoginModule.attemptAuthentication(Unknown Source)
        at com.sun.security.auth.module.Krb5LoginModule.login(Unknown Source)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at sun.reflect.NativeMethodAccessorImpl.invoke(Unknown Source)
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(Unknown Source)
        at java.lang.reflect.Method.invoke(Unknown Source)
        at javax.security.auth.login.LoginContext.invoke(Unknown Source)
        at javax.security.auth.login.LoginContext.access$000(Unknown Source)
        at javax.security.auth.login.LoginContext$4.run(Unknown Source)
        at javax.security.auth.login.LoginContext$4.run(Unknown Source)
        at java.security.AccessController.doPrivileged(Native Method)
        at javax.security.auth.login.LoginContext.invokePriv(Unknown Source)
        at javax.security.auth.login.LoginContext.login(Unknown Source)
        at org.apache.kafka.common.security.authenticator.AbstractLogin.login(AbstractLogin.java:69)
        at org.apache.kafka.common.security.kerberos.KerberosLogin.login(KerberosLogin.java:110)
        at org.apache.kafka.common.security.authenticator.LoginManager.<init>(LoginManager.java:46)
        at org.apache.kafka.common.security.authenticator.LoginManager.acquireLoginManager(LoginManager.java:68)
        at org.apache.kafka.common.network.SaslChannelBuilder.configure(SaslChannelBuilder.java:78)
        ... 25 more
20/02/28 10:00:53 INFO cluster.YarnClusterSchedulerBackend: Shutting down all executors
20/02/28 10:00:53 INFO cluster.YarnSchedulerBackend$YarnDriverEndpoint: Asking each executor to shut down
20/02/28 10:00:53 INFO cluster.SchedulerExtensionServices: Stopping SchedulerExtensionServices
(serviceOption=None,
 services=List(),
 started=false)
20/02/28 10:00:53 INFO spark.MapOutputTrackerMasterEndpoint: MapOutputTrackerMasterEndpoint stopped!
20/02/28 10:00:53 INFO memory.MemoryStore: MemoryStore cleared
20/02/28 10:00:53 INFO storage.BlockManager: BlockManager stopped
20/02/28 10:00:53 INFO storage.BlockManagerMaster: BlockManagerMaster stopped
20/02/28 10:00:53 INFO scheduler.OutputCommitCoordinator$OutputCommitCoordinatorEndpoint: OutputCommitCoordinator stopped!
20/02/28 10:00:53 INFO spark.SparkContext: Successfully stopped SparkContext
20/02/28 10:00:53 INFO yarn.ApplicationMaster: Unregistering ApplicationMaster with SUCCEEDED
20/02/28 10:00:53 INFO impl.AMRMClientImpl: Waiting for application to be successfully unregistered.
20/02/28 10:00:53 INFO yarn.ApplicationMaster: Deleting staging directory hdfs://nameservice1/user/hasif.subair/.sparkStaging/application_1582866369627_0029
20/02/28 10:00:53 INFO util.ShutdownHookManager: Shutdown hook called
20/02/28 10:00:53 INFO util.ShutdownHookManager: Deleting directory /yarn/nm/usercache/hasif.subair/appcache/application_1582866369627_0029/spark-5addfec0-a99f-49e1-b9d1-671c331efb40

代码

    val rawData = spark.readStream.format("kafka")
      .option("kafka.bootstrap.servers", "broker1:9093, broker2:9093")
      .option("subscribe", "hasif_test")
      .option("spark.executor.extraJavaOptions", "-Djava.security.auth.login.config=jaas.conf")
      .option("kafka.security.protocol", "SASL_SSL")
      .option("ssl.truststore.location", "/etc/connect_ts/truststore.jks")
      .option("ssl.truststore.password", "<PASSWORD>")
      .option("ssl.keystore.location", "/etc/connect_ts/keystore.jks")
      .option("ssl.keystore.password", "<PASSWORD>")
      .option("ssl.key.password", "<PASSWORD>")
      .load()
    rawData.writeStream.option("path", "flight/output")
      .option("checkpointLocation", "flight/checkpoint").format("csv").start()

Spark 提交

spark2-submit --master yarn --deploy-mode cluster \
--conf spark.yarn.keytab=hasif.subair.keytab \
--conf spark.yarn.principal=hasif.subair@TEST.ABC \ 
--files /home/hasif.subair/jaas.conf \
--conf "spark.executor.extraJavaOptions=-Djava.security.auth.login.config=./jaas.conf" \
--conf "spark.driver.extraJavaOptions=-Djava.security.auth.login.config=./jaas.conf" \
--conf "spark.kafka.clusters.hasif.ssl.truststore.location=/etc/ts/truststore.jks" \
--conf "spark.kafka.clusters.hasif.ssl.truststore.password=testcluster" \
--conf "spark.kafka.clusters.hasif.ssl.keystore.location=/etc/ts/keystore.jks" \
--conf "spark.kafka.clusters.hasif.ssl.keystore.password=testcluster" \
--conf "spark.kafka.clusters.hasif.ssl.key.password=testcluster" \
--jars spark-sql-kafka-0-10_2.11-2.2.0.jar \
--class TestApp test_app_2.11-0.1.jar \

jaas.conf

KafkaClient {
 com.sun.security.auth.module.Krb5LoginModule required
 useTicketCache=true
 principal="hasif.subair@TEST.ABC"
 useKeyTab=true
 serviceName="kafka"
 keyTab="hasif.subair.keytab"
 client=true;
};

任何帮助将不胜感激。

最佳答案

Kafka自己的配置可以通过DataStreamReader.option加上kafka.前缀来设置,例如

val clusterName = "hasif"
stream.option(s"spark.kafka.clusters.${clusterName}.kafka.ssl.keystore.location", "/etc/connect_ts/keystore.jks")

使用 kafka.ssl.truststore.location 而不是 ssl.truststore.location。 同样的,你可以为其他属性添加kafka前缀并尝试。

关于scala - 具有 SASL_SSL 身份验证的 Kafka Spark 结构化流,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/60450182/

相关文章:

multithreading - 从另一个线程修改局部变量是什么意思?

arrays - 如何根据索引访问 Spark RDD 元素数组

scala - 为什么Scala编译器说逆变类型A出现在类型> : A <: Any of type B?的协变位置

apache-spark - 发送Spark流指标以打开tsdb

apache-kafka - 如何在 apache kafka 主题中获取最后一个事件状态

scala - 真的需要 scala.util.automata、scala.util.regexp 和 scala.util.grammar 吗?

java - 如何通过避免 apache Spark 中的平面映射操作来提高性能

java - 如何使用java在Apache Spark程序中指定Hive的元存储?

spring-boot - 带有 spring-kafka 的 Kafka 死信队列(DLQ)

java - 无法从 Windows 生成在 WSL 2 上运行的 Kafka 主题