hadoop - 选择配置单元执行引擎

标签 hadoop hive

在下面显示的 3 个 Hive 执行引擎中,在 Hadoop 集群中工作时更推荐使用哪一个。当我们必须使用(理想选择)时,用例是什么。

我尝试了一个样本大小为 400M 的查询,引擎 Tez 给我的输出比其他 2 个更快,查询摘要包括分组和过滤。

set hive.execution.engine=spark;
set hive.execution.engine=tez;
set hive.execution.engine=mr;

我试图通过查看查询来得出答案,应该能够确定特定引擎将比其他引擎更快地给出结果。

最佳答案

The benefits that Tez provides over MapReduce execution engine while using Hive are:
● Tez does not write data to the disk during the intermediary steps of a Hive query. Tez makes use of
Directed Acyclic Graphs and the data from an intermediary step is passed on to the next step in the
graph instead of being written to the disk like it is done when using the MapReduce engine.
Removal of these IO operations saves a lot of time when dealing with large amounts of data.
● Tez and YARN together enable you to use objects in a container across applications. If two
applications require the same object(say a data frame) and are running within the same container,
you need not create the same object, again and again, you can reuse it. This leads to better
management of resources and also helps improve the performance.

请在这里查看 Spark 引擎

https://community.cloudera.com/t5/Support-Questions/Hive-execution-engine-set-to-Spark-is-recommended/m-p/177906

如果你想运行交互式查询,那么LLAP(Live Long and Process)引擎是合适的。

关于hadoop - 选择配置单元执行引擎,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/57668455/

相关文章:

amazon-web-services - 由于hadoop用户 `File '/var/aws/emr/userData.json无法读取到ssh到胶开发端点

hadoop - 如何从 Cassandra 加载数据到 HDFS?

java - ClassNotFoundException : org. apache.hadoop.conf.Configuration 启动 Flink SQL 客户端

hadoop - 映射器Windows客户端不起作用

apache-spark - 通过远程配置单元运行sql查询时出现未知主机错误

scala - 使用 Scala 将 base64 解码为 ASCII

scala - 解码的Snappy压缩字节数组的结尾为零

使用 R & data.table 或 HIVE 将\\N 替换为 NA

hadoop - 在 hive-0.10.0 中查找 30 天前的日期

hive - 如何在 ORC 分区 Hive 表的中间添加一列,并且仍然能够使用新结构查询旧分区文件