这一切都是根据本指南:http://blog.cloudera.com/blog/2012/12/how-to-use-a-serde-in-apache-hive/
hive> ADD JAR /home/hadoop/hive-serdes-1.0-SNAPSHOT.jar;
Added /home/hadoop/hive-serdes-1.0-SNAPSHOT.jar to class path
Added resource: /home/hadoop/hive-serdes-1.0-SNAPSHOT.jar
在 /tmp/new
我有一个文件 abc.json
其中包含以下内容:
http://pastie.org/9504218
CREATE EXTERNAL TABLE 命令运行正常,但不接收任何数据:
hive>
> CREATE EXTERNAL TABLE tweets (
> id BIGINT,
> created_at STRING,
> source STRING,
> favorited BOOLEAN,
> retweeted_status STRUCT<
> text:STRING,
> user:STRUCT<screen_name:STRING,name:STRING>,
> retweet_count:INT>,
> entities STRUCT<
> urls:ARRAY<STRUCT<expanded_url:STRING>>,
> user_mentions:ARRAY<STRUCT<screen_name:STRING,name:STRING>>,
> hashtags:ARRAY<STRUCT<text:STRING>>>,
> text STRING,
> user STRUCT<
> screen_name:STRING,
> name:STRING,
> friends_count:INT,
> followers_count:INT,
> statuses_count:INT,
> verified:BOOLEAN,
> utc_offset:INT,
> time_zone:STRING>,
> in_reply_to_screen_name STRING
> )
> PARTITIONED BY (datehour INT)
> ROW FORMAT SERDE 'com.cloudera.hive.serde.JSONSerDe'
> LOCATION '/tmp/new';
OK
Time taken: 0.142 seconds
选择*:
hive> select * from tweets;
OK
Time taken: 0.392 seconds
这里可能发生了什么?
最佳答案
所以问题是您创建了一个带分区的外部表。但是您没有在 Hive 中添加分区,也没有以这种方式在 HDFS 中创建目录。
以下是您可以遵循的步骤:
1.) Run the create table statement.
2.) In the directory /tmp/new/ create a sub directory datehour=<some int value>, and then put your .json file inside this.
3.) Run alter table statement adding this partition to metadata:
alter table tweets add partition(datehour=<some int value>);
4.) Now run the select statement.
希望对您有所帮助...!!!
关于hadoop - 无法在 Hive 中使用 JSON Serde,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/25507788/