csv - 在 Hive 中上传封闭格式的 .csv 数据

标签 csv hadoop hive

我的 .csv 文件采用封闭格式。

    "13","9827259163","0","D","2"
    "13","9827961481","0","D","2"
    "13","9827202228","0","A","2"
    "13","9827529897","0","A","2"
    "13","9827700249","0","A","2"
    "12","9883219029","0","A","2"
    "17","9861065312","0","A","2"
    "17","9861220761","0","D","2"
    "13","9827438384","0","A","2"
    "13","9827336733","0","D","2"
    "13","9827380905","0","D","2"
    "13","9827115358","0","D","2"
    "17","9861475884","0","D","2"
    "17","9861511646","0","D","2"
    "17","9861310397","0","D","2"
    "13","9827035035","0","A","2"
    "13","9827304969","0","D","2"
    "13","9827355786","0","A","2"
    "13","9827702373","0","A","2"

就像在 mysql 中一样,我尝试如下使用“enclosed”关键字。

CREATE EXTERNAL TABLE dnd (ServiceAreaCode varchar(50), PhoneNumber varchar(15), Preferences varchar(15), Opstype varchar(15), PhoneType varchar(10))
ROW FORMAT DELIMITED
        FIELDS TERMINATED BY ',' ENCLOSED BY '"'
        LINES TERMINATED BY '\n'
LOCATION '/dnd';

但是,它给出了如下错误...

NoViableAltException(26@[1704:103: ( tableRowFormatMapKeysIdentifier )?])
    at org.antlr.runtime.DFA.noViableAlt(DFA.java:158)
    at org.antlr.runtime.DFA.predict(DFA.java:144)
    at org.apache.hadoop.hive.ql.parse.HiveParser.rowFormatDelimited(HiveParser.java:30427)
    at org.apache.hadoop.hive.ql.parse.HiveParser.tableRowFormat(HiveParser.java:30662)
    at org.apache.hadoop.hive.ql.parse.HiveParser.createTableStatement(HiveParser.java:4683)
    at org.apache.hadoop.hive.ql.parse.HiveParser.ddlStatement(HiveParser.java:2144)
    at org.apache.hadoop.hive.ql.parse.HiveParser.execStatement(HiveParser.java:1398)
    at org.apache.hadoop.hive.ql.parse.HiveParser.statement(HiveParser.java:1036)
    at org.apache.hadoop.hive.ql.parse.ParseDriver.parse(ParseDriver.java:199)
    at org.apache.hadoop.hive.ql.parse.ParseDriver.parse(ParseDriver.java:166)
    at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:404)
    at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:322)
    at org.apache.hadoop.hive.ql.Driver.compileInternal(Driver.java:975)
    at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:1040)
    at org.apache.hadoop.hive.ql.Driver.run(Driver.java:911)
    at org.apache.hadoop.hive.ql.Driver.run(Driver.java:901)
    at org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:268)
    at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:220)
    at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:423)
    at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:359)
    at org.apache.hadoop.hive.cli.CliDriver.processReader(CliDriver.java:456)
    at org.apache.hadoop.hive.cli.CliDriver.processFile(CliDriver.java:466)
    at org.apache.hadoop.hive.cli.CliDriver.executeDriver(CliDriver.java:748)
    at org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:686)
    at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:625)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
    at java.lang.reflect.Method.invoke(Method.java:606)
    at org.apache.hadoop.util.RunJar.main(RunJar.java:212)
FAILED: ParseException line 5:33 cannot recognize input near 'ENCLOSED' 'BY' ''"'' in serde properties specification

有没有办法直接导入这个文件??提前致谢。

最佳答案

另辟蹊径。解决方案是serde。请使用此链接下载 serde jar:https://github.com/downloads/IllyaYalovyy/csv-serde/csv-serde-0.9.1.jar

然后使用配置单元提示执行以下步骤:

add jar path/to/csv-serde.jar;

create table dnd (ServiceAreaCode varchar(50), PhoneNumber varchar(15), Preferences varchar(15), Opstype varchar(15), PhoneType varchar(10))
row format serde 'com.bizo.hive.serde.csv.CSVSerde'
with serdeproperties(
"separatorChar" = "\,",
"quoteChar" = "\"")
stored as textfile
;

然后使用以下查询从给定路径加载数据:

将数据本地输入路径'path/xyz.csv'加载到表dnd中; 然后运行:

select * from dnd;

关于csv - 在 Hive 中上传封闭格式的 .csv 数据,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/24907257/

相关文章:

python - 为什么我不能将 str 列表转换为 float 列表?

java - 结合Hive与Mahout进行推荐

json - 推特情绪分析

hive - 在配置单元中将代码字符串翻译成 desc

sql-server - 将 .tsv 导入到 sql Server 时出现问题

ruby-on-rails - 在点击下载 CSV 链接时需要 Active Admin 重定向到登录页面

java - 相同的 unicode 字符在不同的 IDE 中表现不同

hadoop - 即使节点有可用内存,Yarn 作业也会崩溃,退出代码为 143

file - 使用 PIG 或 HIVE 从 CSV 中删除前两行

sql - 在bigquery中查询hive