hadoop - 如何将impala查询输出上传到hdfs “directly”(在impala shell上)？

我想将Impala查询的结果上传到hdfs。我通过Impala shell执行查询:

impala-shell -B --output_delimiter=',' -o result.txt -q " select *
                                                          from my_table
                                                          where my_conditions"

...它在本地存储result.txt，然后将该文本文件上传到hdfs。
但是我发现文本文件太大，给系统带来很大压力(例如磁盘I / O)。
所以我试图用bash脚本将查询结果存储在某个变量中，但是出现了错误:

xrealloc: cannot allocate #######bytes ( ####bytes allocated) command result

我认为结果较大是原因。有什么方法可以将查询结果“直接”上传到hdfs？还是有其他解决方案？

最佳答案

正如@ koushik-roy在评论中提到的那样，将结果加载到HDFS的最佳方法是创建另一个Hive表。就像是

impala-shell -q "create table result_table as select * from my_table where my_conditions"

但是，如果只需要“普通文件”，则可以尝试将查询输出管道传递给HDFS shell put 命令，如下所示:

impala-shell -B --out:put_delimiter=',' -q " select * from my_table where my_conditions" | hadoop fs -put - </your/hdfs/path/for/result>

请注意在shell命令中使用“从标准输入读取输入”选项(-)。

关于hadoop - 如何将impala查询输出上传到hdfs “directly”(在impala shell上)？，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/64441291/

上一篇：docker-compose 到 Bluemix 失败。 “TypeError: string indices must be integers”

下一篇：docker - 在Dockerfile中启动kdc

相关文章：

hadoop - 无法作为主目录的所有者执行HDFS文件系统命令

hadoop - hadoop 中的 Zstandard 级别

java - Hadoop Java - 将文件从Windows共享文件夹服务器复制到HDFS

hadoop - "No common protection layer between client and server"尝试与 Kerberized Hadoop 集群通信时

hadoop - ETL informatica 大数据版(非云版)能否连接Cloudera Impala？

hadoop - 即使从 hdfs 删除后，Hive 如何读取数据？

java - org.apache.nifi.bootstrap.Command Apache NiFi 未运行

hadoop - 在字数统计程序中使用 2 个 reducer 的输出

linux - 设置 ACL 权限 - 创建新目录应用错误的过滤器(默认问题)

hadoop - 使Impala中无法识别的元数据无效