sql - MapReduce Job继续以 map = 0%运行,减少= 0%的时间

标签 sql hadoop hive mapreduce

我正在运行一个类似的Hive查询

create table table1 as select split(comments,' ') as words from table2;

comments列具有以空格分隔的字符串形式的评论。

当我运行此查询时,MapReduce作业开始,并继续以Map 0%运行数小时。在此过程中不会出现任何错误。
hive> create table jw_1 as select split(comments,' ') from removed_null_values;
Query ID = xxx-190418201314_7781cf59-6afb-4e82-ab75-c7e343c4985e
Total jobs = 3
Launching Job 1 out of 3
Number of reduce tasks is set to 0 since there's no reduce operator
Starting Job = job_1555607912038_0013, Tracking URL = http://xxx-VirtualBox:8088/proxy/application_1555607912038_0013/
Kill Command = /usr/local/bin/hadoop-3.2.0/bin/mapred job  -kill job_1555607912038_0013
Hadoop job information for Stage-1: number of mappers: 1; number of reducers: 0
2019-04-18 20:13:30,568 Stage-1 map = 0%,  reduce = 0%
2019-04-18 20:14:31,140 Stage-1 map = 0%,  reduce = 0%, Cumulative CPU 39.6 sec
2019-04-18 20:15:31,311 Stage-1 map = 0%,  reduce = 0%, Cumulative CPU 101.64 sec
2019-04-18 20:16:31,451 Stage-1 map = 0%,  reduce = 0%, Cumulative CPU 146.5 sec
2019-04-18 20:17:31,684 Stage-1 map = 0%,  reduce = 0%, Cumulative CPU 212.08 sec


但是当我尝试
select split(comments,' ') from table2;

我可以在shell中以数组形式查看注释。
["\"Lauren","was","promptly","responsive","in","advance","of","our","booking.","providing","a","lot","of","helpful","info.","And","she","stayed","in","contact","and","was","readily","available","prior","to","and","during","our","stay.","which","was","awesome.","The","location.","price","and","privacy","were","the","real","benefits."]


我还运行了一些其他查询,其中MapReduce作业完成并产生所需的结果

我目前正在使用Hive 3.1.1

基本上,我想用包含单词的数组创建一个新表,然后在标记该列时

我是Hive的新手,正在对大小为35MB的数据文件进行情感分析。

最佳答案

在第一种情况下,转换为MapReduce时,您很可能没有完成Hive查询所需的资源。您必须查看YARN或MR1,以确定您是否有足够的计算资源来运行MapReduce作业。

在第二个查询中,某些Hive查询触发不会触发MapReduce作业,这就是它回来的原因。有关更多信息,请参见How does Hive decide when to use map reduce and when not to?

关于sql - MapReduce Job继续以 map = 0%运行,减少= 0%的时间,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/55755533/

相关文章:

php - SQL 结果作为 PHP 数组

sql - MySQL 针对多个条件返回不同的结果

MySQL:MIN 和 Group By:获取每个人一天的 12x # 个事件的第一个时间戳

java - Mahout 在行动 : Chapter 06: Wikipedia job fails with java. lang.ArrayIndexOutOfBoundsException

apache-spark - Hive无法读取Spark生成的分区 Parquet 文件

sql - Laravel 按条件 eloquent 或 sql 排序

java - 错误:找不到流 jar

hadoop - 在 distcp 中更改目标文件名/位置

MySQL/hive 查询数组中每一项的一个结果

hadoop - 配置单元中的日期格式