是否建议将数字列用于分区键?当我们对数字列分区和字符串列分区进行选择查询时,性能会有什么不同吗?
最佳答案
好吧,如果您查看 Impala 官方文档,情况会有所不同。
我不会详细说明,而是粘贴文档中的部分,因为我认为它很好地说明了这一点:
"Although it might be convenient to use STRING columns for partition keys, even when those columns contain numbers, for performance and scalability it is much better to use numeric columns as partition keys whenever practical. Although the underlying HDFS directory name might be the same in either case, the in-memory storage for the partition key columns is more compact, and computations are faster, if partition key columns such as YEAR, MONTH, DAY and so on are declared as INT, SMALLINT, and so on."
引用:https://www.cloudera.com/documentation/enterprise/5-14-x/topics/impala_string.html
关于apache-spark - 字符串分区键与整数分区键的 Hive/Impala 性能,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/52082114/