在下面显示的 3 个 Hive 执行引擎中,在 Hadoop 集群中工作时更推荐使用哪一个。当我们必须使用(理想选择)时,用例是什么。
我尝试了一个样本大小为 400M 的查询,引擎 Tez 给我的输出比其他 2 个更快,查询摘要包括分组和过滤。
set hive.execution.engine=spark;
set hive.execution.engine=tez;
set hive.execution.engine=mr;
我试图通过查看查询来得出答案,应该能够确定特定引擎将比其他引擎更快地给出结果。
最佳答案
The benefits that Tez provides over MapReduce execution engine while using Hive are:
● Tez does not write data to the disk during the intermediary steps of a Hive query. Tez makes use of
Directed Acyclic Graphs and the data from an intermediary step is passed on to the next step in the
graph instead of being written to the disk like it is done when using the MapReduce engine.
Removal of these IO operations saves a lot of time when dealing with large amounts of data.
● Tez and YARN together enable you to use objects in a container across applications. If two
applications require the same object(say a data frame) and are running within the same container,
you need not create the same object, again and again, you can reuse it. This leads to better
management of resources and also helps improve the performance.
请在这里查看 Spark 引擎
如果你想运行交互式查询,那么LLAP(Live Long and Process)引擎是合适的。
关于hadoop - 选择配置单元执行引擎,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/57668455/