hadoop - HIVE 插入到动态分区表永远运行/挂起

标签 hadoop hive hql emr

假设我们有 2 个配置单元表,tableA 和 tableB。 我正在分解表 A,将它与其他几个表连接起来,然后插入到表 B 中。

当 tableB 没有分区或使用静态分区完成插入时,插入工作正常。

然而,当存在动态分区时,map reduce 作业甚至不会启动。它有点挂起。

为了调试更多,我在初始化配置单元时设置了以下参数:

-hiveconf hive.root.logger=DEBUG,console

现在,我可以看到作业实际上并没有挂起。 它不断打印日志,如:

........

    16/02/11 09:25:50 [main]: INFO optimizer.SortedDynPartitionOptimizer: Sorted dynamic partitioning optimization kicked in..
    16/02/11 09:25:50 [main]: INFO optimizer.SortedDynPartitionOptimizer: Inserted RS_2139 and EX_2140 as parent of FS_68 and child of EX_2138
    16/02/11 09:25:55 [main]: INFO optimizer.SortedDynPartitionOptimizer: Sorted dynamic partitioning optimization kicked in..
    16/02/11 09:25:55 [main]: INFO optimizer.SortedDynPartitionOptimizer: Inserted RS_2141 and EX_2142 as parent of FS_68 and child of EX_2140
    16/02/11 09:25:59 [main]: INFO optimizer.SortedDynPartitionOptimizer: Sorted dynamic partitioning optimization kicked in..
    16/02/11 09:25:59 [main]: INFO optimizer.SortedDynPartitionOptimizer: Inserted RS_2143 and EX_2144 as parent of FS_68 and child of EX_2142
    16/02/11 09:26:03 [main]: INFO optimizer.SortedDynPartitionOptimizer: Sorted dynamic partitioning optimization kicked in..
    16/02/11 09:26:03 [main]: INFO optimizer.SortedDynPartitionOptimizer: Inserted RS_2145 and EX_2146 as parent of FS_68 and child of EX_2144
    16/02/11 09:26:08 [main]: INFO optimizer.SortedDynPartitionOptimizer: Sorted dynamic partitioning optimization kicked in..
    16/02/11 09:26:08 [main]: INFO optimizer.SortedDynPartitionOptimizer: Inserted RS_2147 and EX_2148 as parent of FS_68 and child of EX_2146
    16/02/11 09:26:12 [main]: INFO optimizer.SortedDynPartitionOptimizer: Sorted dynamic partitioning optimization kicked in..
    16/02/11 09:26:12 [main]: INFO optimizer.SortedDynPartitionOptimizer: Inserted RS_2149 and EX_2150 as parent of FS_68 and child of EX_2148
    16/02/11 09:26:17 [main]: INFO optimizer.SortedDynPartitionOptimizer: Sorted dynamic partitioning optimization kicked in..
    16/02/11 09:26:17 [main]: INFO optimizer.SortedDynPartitionOptimizer: Inserted RS_2151 and EX_2152 as parent of FS_68 and child of EX_2150
    16/02/11 09:26:19 [Thread-5]: INFO metrics.MetricsSaver: Saved 8:22 records to /mnt/var/em/raw/i-63eec5e6_20160211_RunJar_14276_raw.bin
    16/02/11 09:26:21 [main]: INFO optimizer.SortedDynPartitionOptimizer: Sorted dynamic partitioning optimization kicked in..
    16/02/11 09:26:21 [main]: INFO optimizer.SortedDynPartitionOptimizer: Inserted RS_2153 and EX_2154 as parent of FS_68 and child of EX_2152
    16/02/11 09:26:26 [main]: INFO optimizer.SortedDynPartitionOptimizer: Sorted dynamic partitioning optimization kicked in..
    16/02/11 09:26:26 [main]: INFO optimizer.SortedDynPartitionOptimizer: Inserted RS_2155 and EX_2156 as parent of FS_68 and child of EX_2154
    16/02/11 09:26:30 [main]: INFO optimizer.SortedDynPartitionOptimizer: Sorted dynamic partitioning optimization kicked in..
    16/02/11 09:26:30 [main]: INFO optimizer.SortedDynPartitionOptimizer: Inserted RS_2157 and EX_2158 as parent of FS_68 and child of EX_2156
    16/02/11 09:26:35 [main]: INFO optimizer.SortedDynPartitionOptimizer: Sorted dynamic partitioning optimization kicked in..
    16/02/11 09:26:35 [main]: INFO optimizer.SortedDynPartitionOptimizer: Inserted RS_2159 and EX_2160 as parent of FS_68 and child of EX_2158
    16/02/11 09:26:40 [main]: INFO optimizer.SortedDynPartitionOptimizer: Sorted dynamic partitioning optimization kicked in..
    16/02/11 09:26:40 [main]: INFO optimizer.SortedDynPartitionOptimizer: Inserted RS_2161 and EX_2162 as parent of FS_68 and child of EX_2160
    16/02/11 09:26:45 [main]: INFO optimizer.SortedDynPartitionOptimizer: Sorted dynamic partitioning optimization kicked in..
    16/02/11 09:26:45 [main]: INFO optimizer.SortedDynPartitionOptimizer: Inserted RS_2163 and EX_2164 as parent of FS_68 and child of EX_2162
    16/02/11 09:26:49 [Thread-5]: INFO metrics.MetricsSaver: Saved 8:22 records to /mnt/var/em/raw/i-63eec5e6_20160211_RunJar_14276_raw.bin
    16/02/11 09:26:50 [main]: INFO optimizer.SortedDynPartitionOptimizer: Sorted dynamic partitioning optimization kicked in..
    16/02/11 09:26:50 [main]: INFO optimizer.SortedDynPartitionOptimizer: Inserted RS_2165 and EX_2166 as parent of FS_68 and child of EX_2164
    16/02/11 09:26:56 [main]: INFO optimizer.SortedDynPartitionOptimizer: Sorted dynamic partitioning optimization kicked in..
    16/02/11 09:26:56 [main]: INFO optimizer.SortedDynPartitionOptimizer: Inserted RS_2167 and EX_2168 as parent of FS_68 and child of EX_2166

..............

这些日志像永远一样打印! 但是,如果没有动态分区,完整的插入查询将在大约 10 分钟内成功完成。

此外,整个表中动态分区的不同值的数量只有 3 个,所以这不是我使用不合适的列作为动态分区的情况。

因此,

  1. 正在打印的日志是什么意思?

  2. 这种情况需要什么优化/补救措施?

非常感谢您的提前帮助!

最佳答案

设置以下参数有效:

SET hive.optimize.sort.dynamic.partition=false

我的配置单元版本是 0.13.1。 为该参数引用 apache wiki:

hive.optimize.sort.dynamic.partition

默认值:在 Hive 0.13.0 和 0.13.1 中为 true;在 Hive 0.14.0 及更高版本中为 false (HIVE-8151) 添加于:带有 HIVE-6455 的 Hive 0.13.0 启用后,动态分区列将进行全局排序。这样我们就可以只为 reducer 中的每个分区值保持一个记录写入器打开,从而减少 reducer 的内存压力。

谢谢。

关于hadoop - HIVE 插入到动态分区表永远运行/挂起,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/35335579/

相关文章:

hibernate - 如何在hql中的两个日期之间动态搜索?

apache - 在VM上安装时发生HBase错误

apache-spark - 将数据帧保存到表 - Pyspark 中的性能

excel - 创建 Hive 表并从 xls 文件插入数据

hadoop - Hive 作业在 cassandra 集群上无法正常运行,reducer 卡住

sql - Hive - 根据收集时间提取第一条和最后一条记录的数据

hadoop - Hive UDF 执行

java - 如果PowerMock测试之前运行,则本地集成测试将失败

java - Hadoop 程序找不到已安装的二进制文件

java - 使用HQL查询关联表