我正在编写一个 Pig 程序,该程序加载一个用制表符分隔整个文件的文件
例如:名称 TAB 年份 TAB 计数 TAB...
file = LOAD 'file.csv' USING PigStorage('\t') as (type: chararray, year: chararray,
match_count: float, volume_count: float);
-- Group by type
grouped = GROUP file BY type;
-- Flatten
by_type = FOREACH grouped GENERATE FLATTEN(group) AS (type, year, match_count, volume_count);
group_operat = FOREACH by_type GENERATE
SUM(match_count) AS sum_m,
SUM(volume_count) AS sum_v,
(float)sum_m/sm_v;
DUMP group_operat;
问题在于我尝试创建的组操作对象。 我想要对所有匹配计数求和,对所有数量计数求和,然后将匹配计数除以数量计数。
我在算术运算/对象创建中做错了什么? 我收到的错误是第 7 行,第 11 列> Pig 脚本无法验证:org.apache.pig.impl.ologicalLayer.FrontendException:错误 1031:不兼容的架构:左侧为“类型:NULL,年份:NULL,match_count” :NULL,volume_count:NULL”,右边是“group:chararray”
谢谢。
最佳答案
尝试这样,这将返回类型和总和。
更新了工作代码
输入.txt
A 2001 10 2
A 2002 20 3
B 2003 30 4
B 2004 40 1
PigScript:
file = LOAD 'input.txt' USING PigStorage() AS (type: chararray, year: chararray,
match_count: float, volume_count: float);
grouped = GROUP file BY type;
group_operat = FOREACH grouped {
sum_m = SUM(file.match_count);
sum_v = SUM(file.volume_count);
GENERATE group,(float)(sum_m/sum_v) as sum_mv;
}
DUMP group_operat;
输出:
(A,6.0)
(B,14.0)
关于sum - PIG : sum and division, 创建对象,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/26986892/