apache-spark - 在 Kerberos 环境中无法通过 Spark 访问 Hive : Client cannot authenticate via:[TOKEN, KERBEROS]

标签 apache-spark hadoop hive apache-tez kerberos-delegation

大家好,我在 kerberos 环境中运行 Spark(2.4.4),我编写了一段代码来通过 Spark 查询 Hive 表。我也在spark-submit命令中执行kinit,但我仍然面临

java.io.IOException:
org.apache.hadoop.security.AccessControlException: Client cannot authenticate via:[TOKEN, KERBEROS];

这是我的代码:-

    @transient lazy val spark: SparkSession = getSparkSession()
    
      def getSparkSession(): SparkSession = {
        log.info("Creating spark session")
        var sparkBuilder: SparkSession.Builder = SparkSession.builder().
          master("local[*]").
          appName("Query Hive Via Spark").
          config("hive.exec.scratchdir", "/tmp/hive").enableHiveSupport().
          config("hive.exec.dynamic.partition", "true").
          config("hive.exec.dynamic.partition.mode", "nonstrict").
          config("hive.exec.max.dynamic.partitions", "1000")
      
        @transient lazy val spark: SparkSession = sparkBuilder.getOrCreate()
        registerUdfs(spark)
        spark.sparkContext.setLogLevel(logLevel)
        spark
      } 

通过 Spark Sql 访问 Hive 表的代码。

      val resultDF= spark.sql(s"SELECT count(*) AS cnt FROM brl_in_cash.cash_in_incoming_data WHERE insert_date='20200821'")
        resultDF.printSchema()
        resultDF.show(false) 

我正在执行 Spark-submit 的 shell 脚本,其中我正在执行 kinit 并传递 --principal $KERBEROS_PRINCIPAL --keytab $KERBEROS_KEYTAB 。

Spark-提交命令:-

    spark-submit --master yarn --deploy-mode cluster \
      --verbose \
      --name ${appName} \
      --principal $KERBEROS_PRINCIPAL \
      --keytab $KERBEROS_KEYTAB \
      --driver-memory 4g \
      --executor-memory 4g \
      --executor-cores 2 \
      --files ${hiveSite.xml} \
      --conf spark.hadoop.yarn.timeline-service.enabled=false \
      --conf spark.hadoop.yarn.client.failover-proxy-provider=org.apache.hadoop.yarn.client.ConfiguredRMFailoverProxyProvider \
      --conf spark.security.credentials.EsServiceCredentialProvider.enabled=false \
      --class com.dpk.hive.HiveViaSpark "${jarPath}"

错误日志:-

20/08/26 13:34:17 INFO TezClient: Failed to retrieve AM Status via proxy
com.google.protobuf.ServiceException: java.io.IOException: Failed on local exception: java.io.IOException: org.apache.hadoop.security.AccessControlException: Client cannot authenticate via:[TOKEN, KERBEROS]; Host Details : local host is: "dfghcv012.global.xyz.com/10.7.1.52"; destination host is: "dfghcv013.global.xyz.com":43890; 
at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:243)
        at com.sun.proxy.$Proxy36.getAMStatus(Unknown Source)
        at org.apache.tez.client.TezClient.getAppMasterStatus(TezClient.java:618)
        at org.apache.tez.client.TezClient.waitTillReady(TezClient.java:697)
        at org.apache.hadoop.hive.ql.exec.tez.TezSessionState.open(TezSessionState.java:205)
        at org.apache.hadoop.hive.ql.exec.tez.TezSessionState.open(TezSessionState.java:116)
        at org.apache.hadoop.hive.ql.session.SessionState.start(SessionState.java:532)
        at org.apache.spark.sql.hive.client.HiveClientImpl.newState(HiveClientImpl.scala:183)
        at org.apache.spark.sql.hive.client.HiveClientImpl.<init>(HiveClientImpl.scala:117)
        at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
        at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
        at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
        at java.lang.reflect.Constructor.newInstance(Constructor.java:422)
        at org.apache.spark.sql.hive.client.IsolatedClientLoader.createClient(IsolatedClientLoader.scala:271)
        at org.apache.spark.sql.hive.HiveUtils$.newClientForMetadata(HiveUtils.scala:384)
        at org.apache.spark.sql.hive.HiveUtils$.newClientForMetadata(HiveUtils.scala:286)
        at org.apache.spark.sql.hive.HiveExternalCatalog.client$lzycompute(HiveExternalCatalog.scala:66)
        at org.apache.spark.sql.hive.HiveExternalCatalog.client(HiveExternalCatalog.scala:65)
        at org.apache.spark.sql.hive.HiveExternalCatalog$$anonfun$databaseExists$1.apply$mcZ$sp(HiveExternalCatalog.scala:215)
        at org.apache.spark.sql.hive.HiveExternalCatalog$$anonfun$databaseExists$1.apply(HiveExternalCatalog.scala:215)
        at org.apache.spark.sql.hive.HiveExternalCatalog$$anonfun$databaseExists$1.apply(HiveExternalCatalog.scala:215)
        at org.apache.spark.sql.hive.HiveExternalCatalog.withClient(HiveExternalCatalog.scala:97)
        at org.apache.spark.sql.hive.HiveExternalCatalog.databaseExists(HiveExternalCatalog.scala:214)
        at org.apache.spark.sql.internal.SharedState.externalCatalog$lzycompute(SharedState.scala:114)
        at org.apache.spark.sql.internal.SharedState.externalCatalog(SharedState.scala:102)
        at org.apache.spark.sql.internal.SharedState.globalTempViewManager$lzycompute(SharedState.scala:141)
        at org.apache.spark.sql.internal.SharedState.globalTempViewManager(SharedState.scala:136)
        at org.apache.spark.sql.hive.HiveSessionStateBuilder$$anonfun$2.apply(HiveSessionStateBuilder.scala:55)
        at org.apache.spark.sql.hive.HiveSessionStateBuilder$$anonfun$2.apply(HiveSessionStateBuilder.scala:55)
        at org.apache.spark.sql.catalyst.catalog.SessionCatalog.globalTempViewManager$lzycompute(SessionCatalog.scala:91)
        at org.apache.spark.sql.catalyst.catalog.SessionCatalog.globalTempViewManager(SessionCatalog.scala:91)
        at org.apache.spark.sql.catalyst.catalog.SessionCatalog.isTemporaryTable(SessionCatalog.scala:736)
        at org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveRelations$.isRunningDirectlyOnFiles(Analyzer.scala:747)
        at org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveRelations$.resolveRelation(Analyzer.scala:681)
        at org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveRelations$$anonfun$apply$8.applyOrElse(Analyzer.scala:713)
        at org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveRelations$$anonfun$apply$8.applyOrElse(Analyzer.scala:706)
        at org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper$$anonfun$resolveOperatorsUp$1$$anonfun$apply$1.apply(AnalysisHelper.scala:90)
        at org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper$$anonfun$resolveOperatorsUp$1$$anonfun$apply$1.apply(AnalysisHelper.scala:90)
        at org.apache.spark.sql.catalyst.trees.CurrentOrigin$.withOrigin(TreeNode.scala:70)
        at org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper$$anonfun$resolveOperatorsUp$1.apply(AnalysisHelper.scala:89)
        at org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper$$anonfun$resolveOperatorsUp$1.apply(AnalysisHelper.scala:86)
        at org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper$.allowInvokingTransformsInAnalyzer(AnalysisHelper.scala:194)
        at org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper$class.resolveOperatorsUp(AnalysisHelper.scala:86)
        at org.apache.spark.sql.catalyst.plans.logical.LogicalPlan.resolveOperatorsUp(LogicalPlan.scala:29)
        at org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper$$anonfun$resolveOperatorsUp$1$$anonfun$1.apply(AnalysisHelper.scala:87)
        at org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper$$anonfun$resolveOperatorsUp$1$$anonfun$1.apply(AnalysisHelper.scala:87)
        at org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$4.apply(TreeNode.scala:329)
        at org.apache.spark.sql.catalyst.trees.TreeNode.mapProductIterator(TreeNode.scala:187)
        at org.apache.spark.sql.catalyst.trees.TreeNode.mapChildren(TreeNode.scala:327)
        at org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper$$anonfun$resolveOperatorsUp$1.apply(AnalysisHelper.scala:87)
        at org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper$$anonfun$resolveOperatorsUp$1.apply(AnalysisHelper.scala:86)
        at org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper$.allowInvokingTransformsInAnalyzer(AnalysisHelper.scala:194)
        at org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper$class.resolveOperatorsUp(AnalysisHelper.scala:86)
        at org.apache.spark.sql.catalyst.plans.logical.LogicalPlan.resolveOperatorsUp(LogicalPlan.scala:29)
        at org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper$$anonfun$resolveOperatorsUp$1$$anonfun$1.apply(AnalysisHelper.scala:87)
        at org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper$$anonfun$resolveOperatorsUp$1$$anonfun$1.apply(AnalysisHelper.scala:87)
        at org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$4.apply(TreeNode.scala:329)
        at org.apache.spark.sql.catalyst.trees.TreeNode.mapProductIterator(TreeNode.scala:187)
        at org.apache.spark.sql.catalyst.trees.TreeNode.mapChildren(TreeNode.scala:327)
        at org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper$$anonfun$resolveOperatorsUp$1.apply(AnalysisHelper.scala:87)
        at org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper$$anonfun$resolveOperatorsUp$1.apply(AnalysisHelper.scala:86)
        at org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper$.allowInvokingTransformsInAnalyzer(AnalysisHelper.scala:194)
        at org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper$class.resolveOperatorsUp(AnalysisHelper.scala:86)
        at org.apache.spark.sql.catalyst.plans.logical.LogicalPlan.resolveOperatorsUp(LogicalPlan.scala:29)
        at org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveRelations$.apply(Analyzer.scala:706)
        at org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveRelations$.apply(Analyzer.scala:652)
        at org.apache.spark.sql.catalyst.rules.RuleExecutor$$anonfun$execute$1$$anonfun$apply$1.apply(RuleExecutor.scala:87)
        at org.apache.spark.sql.catalyst.rules.RuleExecutor$$anonfun$execute$1$$anonfun$apply$1.apply(RuleExecutor.scala:84)
        at scala.collection.LinearSeqOptimized$class.foldLeft(LinearSeqOptimized.scala:124)
        at scala.collection.immutable.List.foldLeft(List.scala:84)
        at org.apache.spark.sql.catalyst.rules.RuleExecutor$$anonfun$execute$1.apply(RuleExecutor.scala:84)
        at org.apache.spark.sql.catalyst.rules.RuleExecutor$$anonfun$execute$1.apply(RuleExecutor.scala:76)
        at scala.collection.immutable.List.foreach(List.scala:392)
        at org.apache.spark.sql.catalyst.rules.RuleExecutor.execute(RuleExecutor.scala:76)
        at org.apache.spark.sql.catalyst.analysis.Analyzer.org$apache$spark$sql$catalyst$analysis$Analyzer$$executeSameContext(Analyzer.scala:127)
        at org.apache.spark.sql.catalyst.analysis.Analyzer.execute(Analyzer.scala:121)
        at org.apache.spark.sql.catalyst.analysis.Analyzer$$anonfun$executeAndCheck$1.apply(Analyzer.scala:106)
        at org.apache.spark.sql.catalyst.analysis.Analyzer$$anonfun$executeAndCheck$1.apply(Analyzer.scala:105)
        at org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper$.markInAnalyzer(AnalysisHelper.scala:201)
        at org.apache.spark.sql.catalyst.analysis.Analyzer.executeAndCheck(Analyzer.scala:105)
        at org.apache.spark.sql.execution.QueryExecution.analyzed$lzycompute(QueryExecution.scala:57)
        at org.apache.spark.sql.execution.QueryExecution.analyzed(QueryExecution.scala:55)
        at org.apache.spark.sql.execution.QueryExecution.assertAnalyzed(QueryExecution.scala:47)
        at org.apache.spark.sql.Dataset$.ofRows(Dataset.scala:78)
        at org.apache.spark.sql.SparkSession.sql(SparkSession.scala:642)
        at com.sc.sdm.rt.oa.recon.TestConnection$.main(TestConnection.scala:34)
        at com.sc.sdm.rt.oa.recon.TestConnection.main(TestConnection.scala)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.lang.reflect.Method.invoke(Method.java:497)
        at org.apache.spark.deploy.yarn.ApplicationMaster$$anon$2.run(ApplicationMaster.scala:684)
Caused by: java.io.IOException: Failed on local exception: java.io.IOException: org.apache.hadoop.security.AccessControlException: Client cannot authenticate via:[TOKEN, KERBEROS]; Host Details : local host is: "dfghcv012.global.xyz.com/10.7.1.52"; destination host is: "dfghcv013.global.xyz.com":43890; 
        at org.apache.hadoop.net.NetUtils.wrapException(NetUtils.java:776)
        at org.apache.hadoop.ipc.Client.call(Client.java:1479)
        at org.apache.hadoop.ipc.Client.call(Client.java:1412)
        at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:229)
        ... 91 more

感谢任何帮助!!

最佳答案

我能够解决这个问题。

我删除了

--files hive-site.xml 

添加了 Spark 配置

--conf spark.security.credentials.hadoopfs.enabled=true

上述更改对我有用。

关于apache-spark - 在 Kerberos 环境中无法通过 Spark 访问 Hive : Client cannot authenticate via:[TOKEN, KERBEROS],我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/63596797/

相关文章:

hadoop - SNAPSHOTS中的复制因子(在HDFS中)

python - 从远程机器执行长时间运行的配置单元查询

hadoop - Hbase 中的日志结构化合并树

apache-spark - 如何按包含其他数据帧/集的任何值的数组列过滤 Spark 数据帧

amazon-web-services - 在 Glue Dev Endpoint 上设置 Sagemaker 的 Spark 版本

java - 在 Hadoop 中启动 namenode 时出错

hadoop - Input Split 大小是常量还是取决于逻辑记录?

hadoop - 插入到配置单元分区表中 SemanticException

python - 使用 UDF 处理多个列时堆栈溢出

scala - 我们如何对数据框进行排名?