hadoop - 在hive(hadoop)中添加文件后，在仓库中不可见？

我可以像这样在配置单元中添加一个文件:

hive> add file /home/vis/Documents/def.txt;

hive>list files;
/home/vis/Documents/def.txt

现在的问题是，上面的文件在我的仓库里是看不到的。

是否可以在hive仓库(/user/hive/warehouse)中看到。

如果没有，那么我如何在配置单元中看到该文件？

最佳答案

Hive add command puts the file in distributed cache .这是 mapred.local.dir。分布式缓存旨在分发需要存在于所有节点上以供 MR 作业使用的文件，在本例中用于 Hive 查询。

Cloudera 有 a document that gives examples . 根据您的目标，您可能希望先将数据加载到 hdfs，然后 create an external table .

CREATE EXTERNAL TABLE page_view(viewTime INT, userid BIGINT,
     page_url STRING, referrer_url STRING,
     ip STRING COMMENT 'IP Address of the User',
     country STRING COMMENT 'country of origination')
 COMMENT 'This is the staging page view table'
 ROW FORMAT DELIMITED FIELDS TERMINATED BY '\054'
 STORED AS TEXTFILE
 LOCATION '<hdfs_location>';

如果您的意图是让数据文件成为仓库的一部分，您可以省略 external 关键字。

CREATE TABLE page_view(viewTime INT, userid BIGINT,
    page_url STRING, referrer_url STRING,
    ip STRING COMMENT 'IP Address of the User')
COMMENT 'This is the page view table'
PARTITIONED BY(dt STRING, country STRING)
STORED AS SEQUENCEFILE;

关于hadoop - 在hive(hadoop)中添加文件后，在仓库中不可见？，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/25408011/

hadoop - 在hive(hadoop)中添加文件后，在仓库中不可见？

上一篇：hadoop - Sqoop 运行到本地作业运行器模式

下一篇：单台机器上的 Hadoop 多个数据节点