xml - 在 Hive 中解析 xml 时出错

标签 xml hadoop xpath hive

<分区>

我正在使用 Hive 解析 xml 文件,因为我正在使用 hivexmlserde。 当我编写代码并执行它时,出现以下错误。

FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.DDLTask. java.lang.RuntimeException: The number of XPath expressions does not much the number of columns

但我的列号和 xpath 表达式是相同的。

下面是我的代码:

add jar /home/cloudera/hivexmlserde-1.0.5.3.jar;
CREATE EXTERNAL TABLE INFO(
statusCode string,
title string,
startTime string,
endTime string,
frequencyValue string,
frequencyUnits string,
strengthValue string,
strengthUnits string,
routecode string,
routecodeSystem string,
routedisplayName string,
routecodesystemName string,
ugcode string,
uname string,
ucodeSystem string,
codeSystemName string,
ageForm string,
tr_code string,
tr_description string,
tr_codesystem string,
tr_codesystemname string
)
ROW FORMAT SERDE 'com.ibm.spss.hive.serde2.xml.XmlSerDe'
WITH SERDEPROPERTIES (
"column.xpath.statusCode"="Document/xxx/statusCode/text()",
"column.xpath.title"="Document/xxx/code/code/text()",
"column.xpath.startTime"="Document/xxx/startTime/text()",
"column.xpath.endTime"="Document/xxx/endTime/text()",
"column.xpath.frequencyValue"="Document/xxx/frequencyValue/text()",
"column.xpath.frequencyUnits"="Document/xxx/frequencyUnits/text()",
"column.xpath.strengthValue"="Document/xxx/strengthValue/text()",
"column.xpath.strengthUnits"="Document/xxx/strengthUnits/text()",
"column.xpath.routecode"="Document/xxx/entryInfo/routeCode/code/text()",
"column.xpath.routecodeSystem"="Document/xxx/entryInfo/routeCode/codeSystem/text()",
"column.xpath.routedisplayName"="Document/xxx/entryInfo/routeCode/displayName/text()",
"column.xpath.routecodesystemName"="Document/xxx/entryInfo/routeCode/codeSystemName/text()",
"column.xpath.ugcode"="Document/xxx/entryInfo/productCode/code/text()",
"column.xpath.ugname"="Document/xxx/entryInfo/productCode/displayName/text()",
"column.xpath.ugcodeSystem"="Document/xxx/entryInfo/productCode/codeSystem/text()",
"column.xpath.ugcodeSystemName"="Document/xxx/entryInfo/productCode/codeSystemName/text()",
"column.xpath.dosageForm"="Document/xxx/entryInfo/ageForm/displayName/text()",
"column.xpath.tr_code"="Document/xxx/entryInfo/productCode/translation/code/text()",
"column.xpath.tr_description"="Document/xxx/entryInfo/productCode/translation/displayName/text()",
"column.xpath.tr_codesystem"="Document/xxx/entryInfo/productCode/translation/codeSystem/text()",
"column.xpath.tr_codesystem"="Document/xxx/entryInfo/productCode/translation/codeSystemName/text()"
)
STORED AS
INPUTFORMAT 'com.ibm.spss.hive.serde2.xml.XmlInputFormat'
OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.IgnoreKeyTextOutputFormat'
TBLPROPERTIES (
"xmlinput.start"="<Document",
"xmlinput.end"="</Document>");

最佳答案

我在挖掘一些代码后发现了这个问题。我遇到了这个问题,因为我将 2 个 xpath 列名设为相同。

column.xpath.tr_codesystem

在 SERDEPROPERTIES 中重复了两次。我将其更改为 codesystemname 然后它开始为我工作。

关于xml - 在 Hive 中解析 xml 时出错,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/41438880/

相关文章:

hadoop - 用于查找每个部门薪水的最大值、最小值、平均值、总和的 Pig 脚本

java - 从具有相同类名的两个下拉列表中的第一个选项中选择一个选项

java - org.openqa.selenium.ElementNotVisibleException : Element is not currently visible while clicking a checkbox through SeleniumWebDriver and Java

xpath - 使用 JMeter 中的 XPath 查询从 HTML 中的选择元素中提取值

python - 如何使用 Python 替换 XML 中的节点值

android - Eclipse - 单击 XML 内容会删除/重新排列它吗?

SQL XML 创建 xml 树(父子)的显式问题

c# - 我可以从我将要读取的 xml 文件创建一个 XmlNamespaceManager 对象吗?

hadoop - HBase批量加载MapReduce HFile异常(netty jar)

java - HBase MapReduce中的Nullpointer异常