hadoop - Spark-SQL 在 yarn-cluster 上的错误 hdfs 权限

标签 hadoop apache-spark hive hdfs apache-spark-sql

我有一个简单的工作,就是通过 spark sql 在 hdfs 中读取 hive。我首先在 yarn-client 模式下运行它,我没有遇到任何问题。几次之后,我开始通过 yarn-cluster 模式启动它,但我遇到了这个问题:

我有这个 hdfs 权限错误:

Caused by:MetaException(message:org.apache.hadoop.security.AccessControlException: Permission denied: user=yarn, access=EXECUTE, inode="/Projects/SNB/directory/Private/table/table_ORC":hdfs:mygroup:drwxr-xr--
    at org.apache.hadoop.hdfs.server.namenode.FSPermissionChecker.check(FSPermissionChecker.java:319)
    at org.apache.hadoop.hdfs.server.namenode.FSPermissionChecker.checkTraverse(FSPermissionChecker.java:259)
    at org.apache.hadoop.hdfs.server.namenode.FSPermissionChecker.checkPermission(FSPermissionChecker.java:205)
    at org.apache.hadoop.hdfs.server.namenode.FSPermissionChecker.checkPermission(FSPermissionChecker.java:190)
    at org.apache.hadoop.hdfs.server.namenode.FSDirectory.checkPermission(FSDirectory.java:1698)
    at org.apache.hadoop.hdfs.server.namenode.FSDirStatAndListingOp.getFileInfo(FSDirStatAndListingOp.java:108)
    at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getFileInfo(FSNamesystem.java:3857)
    at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.getFileInfo(NameNodeRpcServer.java:1006)
    at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.getFileInfo(ClientNamenodeProtocolServerSideTranslatorPB.java:843)
    at org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
    at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:616)
    at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:969)
    at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2049)
    at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2045)
    at java.security.AccessController.doPrivileged(Native Method)
    at javax.security.auth.Subject.doAs(Subject.java:422)
    at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1657)
    at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2043)
)
    at org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$get_table_result$get_table_resultStandardScheme.read(ThriftHiveMetastore.java:29329)
    at org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$get_table_result$get_table_resultStandardScheme.read(ThriftHiveMetastore.java:29306)
    at org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$get_table_result.read(ThriftHiveMetastore.java:29237)
    at org.apache.thrift.TServiceClient.receiveBase(TServiceClient.java:78)
    at org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$Client.recv_get_table(ThriftHiveMetastore.java:1036)
    at org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$Client.get_table(ThriftHiveMetastore.java:1022)
    at org.apache.hadoop.hive.metastore.HiveMetaStoreClient.getTable(HiveMetaStoreClient.java:997)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
    at java.lang.reflect.Method.invoke(Method.java:497)
    at org.apache.hadoop.hive.metastore.RetryingMetaStoreClient.invoke(RetryingMetaStoreClient.java:89)
    at com.sun.proxy.$Proxy31.getTable(Unknown Source)
    at org.apache.hadoop.hive.ql.metadata.Hive.getTable(Hive.java:976)
    ... 68 more`

但是,当我在这个目录上执行 hdfs dfs -ls 时,它会显示:

drwxrwxrwx -lb23598 mygroups 0 2016-12-20 17:58 /Projects/SNB/directory/Private/table/table_ORC

因此在 yarn 获取的内容与 hdfs 中设置的当前权限之间存在不同步现象。

你有什么想法吗?

非常感谢!

最佳答案

尝试在提交作业之前创建如下所示的环境变量

export HADOOP_USER_NAME=<NAME_OF_THE_USER_THAT_HAS_HDFS_PERMISSION>

关于hadoop - Spark-SQL 在 yarn-cluster 上的错误 hdfs 权限,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/41288034/

相关文章:

在maven中构建成功后的java.lang.NoClassDefFoundError

scala - 'spark.driver.maxResultSize' 的范围

apache-spark - 来自Docker的Ne​​o4j和Spark的ServiceUnavailableException

scala - 为什么spark broadcast在我用extends App的时候效果不好?

hadoop - 永久添加配置单元 jar

hadoop - 列名 DIV 在配置单元中显示错误

date - 配置单元未检测到时间戳格式

hadoop - 在Hive中将语句显示为子查询

arrays - 如何将字符串转换为结构的复杂数组并在 hive 中 explode

hadoop - Hive Joins可以在Apache Nifi中实现吗?