hadoop - 在 HDP-1.3.3 上使用 kerberos 的 Oozie 配置单元操作

标签 hadoop hive oozie hortonworks-data-platform

我正在尝试在启用 kerberos 的环境中从 oozie 配置单元操作执行配置单元脚本。

这是我的 workflow.xml

<action name="hive-to-hdfs">
    <hive xmlns="uri:oozie:hive-action:0.2">
        <job-tracker>${jobTracker}</job-tracker>
        <name-node>${nameNode}</name-node>
        <job-xml>hive-site.xml</job-xml>
        <configuration>
            <property>
                <name>mapred.job.queue.name</name>
                <value>${queueName}</value>
            </property>
        </configuration>
        <script>script.q</script>
        <param>HIVE_EXPORT_TIME=${hiveExportTime}</param>
    </hive>
    <ok to="pass"/>
    <error to="fail"/>

我在尝试连接到 Hive Metastore 时遇到问题。

6870 [main] INFO hive.metastore - Trying to connect to metastore with URI thrift://10.0.0.242:9083 Heart beat Heart beat 67016 [main] WARN hive.metastore - set_ugi() not successful, Likely cause: new client talking to old server. Continuing without it. org.apache.thrift.transport.TTransportException: java.net.SocketTimeoutException: Read timed out at org.apache.thrift.transport.TIOStreamTransport.read(TIOStreamTransport.java:129) at org.apache.thrift.transport.TTransport.readAll(TTransport.java:84) at org.apache.thrift.protocol.TBinaryProtocol.readAll(TBinaryProtocol.java:378) at org.apache.thrift.protocol.TBinaryProtocol.readI32(TBinaryProtocol.java:297)

67018 [main] INFO hive.metastore - Waiting 1 seconds before next connection attempt. 68018 [main] INFO hive.metastore - Connected to metastore. Heart beat Heart beat 128338 [main] WARN org.apache.hadoop.hive.metastore.RetryingMetaStoreClient - MetaStoreClient lost connection. Attempting to reconnect. org.apache.thrift.transport.TTransportException: java.net.SocketTimeoutException: Read timed out at org.apache.thrift.transport.TIOStreamTransport.read(TIOStreamTransport.java:129) at org.apache.thrift.transport.TTransport.readAll(TTransport.java:84) at org.apache.thrift.protocol.TBinaryProtocol.readAll(TBinaryProtocol.java:378) at org.apache.thrift.protocol.TBinaryProtocol.readI32(TBinaryProtocol.java:297)

129339 [main] INFO hive.metastore - Trying to connect to metastore with URI thrift://10.0.0.242:9083 Heart beat Heart beat 189390 [main] WARN hive.metastore - set_ugi() not successful, Likely cause: new client talking to old server. Continuing without it. org.apache.thrift.transport.TTransportException: java.net.SocketTimeoutException: Read timed out at org.apache.thrift.transport.TIOStreamTransport.read(TIOStreamTransport.java:129) at org.apache.thrift.transport.TTransport.readAll(TTransport.java:84) at org.apache.thrift.protocol.TBinaryProtocol.readAll(TBinaryProtocol.java:378) at org.apache.thrift.protocol.TBinaryProtocol.readI32(TBinaryProtocol.java:297)

189391 [main] INFO hive.metastore - Waiting 1 seconds before next connection attempt. 190391 [main] INFO hive.metastore - Connected to metastore. Heart beat Heart beat 250449 [main] ERROR org.apache.hadoop.hive.ql.parse.SemanticAnalyzer - org.apache.hadoop.hive.ql.metadata.HiveException: Unable to fetch table SESSION_MASTER at org.apache.hadoop.hive.ql.metadata.Hive.getTable(Hive.java:953) at org.apache.hadoop.hive.ql.metadata.Hive.getTable(Hive.java:887) at org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.getMetaData(SemanticAnalyzer.java:1083) at org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.getMetaData(SemanticAnalyzer.java:1059)

当我禁用 kerberos 安全工作流时工作正常

最佳答案

要使您的 Oozie Hive 操作能够在安全集群上运行,您需要包含 <credentials>带有“hcat”类型凭据的部分到您的工作流程。

您的工作流程将类似于:

<workflow-app name='workflow' xmlns='uri:oozie:workflow:0.1'>
    <credentials>
        <credential name='hcat' type='hcat'>
            <property>
                <name>hcat.metastore.uri</name>
                <value>HCAT_URI</value>
            </property>
            <property> 
                <name>hcat.metastore.principal</name>
                <value>HCAT_PRINCIPAL</value>
            </property>
        </credential>
    </credentials>

    <action name="hive-to-hdfs" cred="hcat">
        <hive xmlns="uri:oozie:hive-action:0.2">
            <job-tracker>${jobTracker}</job-tracker>
            <name-node>${nameNode}</name-node>
            <job-xml>hive-site.xml</job-xml>
            <configuration>
                <property>
                    <name>mapred.job.queue.name</name>
                    <value>${queueName}</value>
                </property>
            </configuration>
            <script>script.q</script>
            <param>HIVE_EXPORT_TIME=${hiveExportTime}</param>
        </hive>
        <ok to="pass"/>
        <error to="fail"/>
    </action>
</workflow>

还有Oozie documentation关于此功能。

关于hadoop - 在 HDP-1.3.3 上使用 kerberos 的 Oozie 配置单元操作,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/25750528/

相关文章:

hadoop - hive 连接查询

sql - HiveQL在子句中指向一组文件的位置

hadoop - 将多个文件添加到 HIVE 中的分布式缓存

java - 如何忽略前两个字节 hdfs writeUTF 和 writeChars?

hadoop - 生成数千张 map 的 pig 脚本

database - 比 self 联接配置单元查询更有效的方法

hadoop - 具有远程部署的Hadoop键值存储

hadoop - oozie中如何指定多个jar文件

hadoop - Mapreduce 与 HCATALOG 与 MAPR 中的 oozie 集成

hadoop - 如何使用Oozie-coordinator.xml中的jceks文件路径设置set hadoop.security.credential.provider.path