hadoop - 想要在使用 pig 的记录中找到最大记录

标签 hadoop apache-pig

我想找到在 pig 对阵每支球队中得分最多的球员。

Input : Inputs are in the below fashion
Sachin 100 KXIP Hyderabad 1991 sehwag 150 KXIP Hyderabad 1991 Sehwag 100 MI Mumbai 2011 Kohli 0 CSK Chennai 2014 Dhoni 150 MI Hyderabad 1991 Sachin 32 PW Chennai 2014 Dhoni 150 MI Mumbai 2011 我的实现: record1= LOAD 'ipl.txt' using PigStorage(' ') as (name:chararray,runs:int,team:chararray,loc:chararray,year:int); record2 = GROUP record1 by team as team; record3 = FOREACH record2 GENERATE group,MAX(record1.runs) as mx; record4= ORDER record3 by mx ASC; DUMP record4;

输出:
(PW,32)
(KXIP,150)
(MI 150)

但是以以下方式期待结果。
Sachin PW 32钦奈2014

最佳答案

record1= LOAD 'ipl.txt' using PigStorage(' ') as    (name:chararray,runs:int,team:chararray,loc:chararray,year:int);
record2 = GROUP record1 by team;
record3 = FOREACH record2 GENERATE group,MAX(record1.runs) as mx;
record4 = JOIN record3 by (mx,group) LEFT OUTER, record1 by (runs,team);
record5 = FOREACH record4 GENERATE record1::name as name, record1::team as team, record3::mx as mx, record1::year as year;
record6= ORDER record5 by mx ASC;
DUMP record6;

产生以下结果
(Kohli,CSK,0,2014)
(Sachin,PW,32,2014)
(sehwag,KXIP,150,1991)
(Dhoni,MI,150,1991)
(Dhoni,MI,150,2011)

请注意,Dhoni有两条记录,这是因为他两次获得150分。如果要删除,则需要根据需要选择最早或最近的年份。

关于hadoop - 想要在使用 pig 的记录中找到最大记录,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/23925337/

相关文章:

hadoop - Hive 1.1.0 将表分区类型从 int 更改为 string

hadoop - 在接收器失败后,我如何强制 Flume-NG 处理积压的事件?

hadoop - distcp java api 退出应用程序

hadoop - -Dpig.additional.jars 包含 HDFS 和本地文件系统上的文件

hadoop - 生成数千张 map 的 pig 脚本

java - 使用我自己的类作为输出值MapReduce Hadoop时,Reducer不会调用reduce方法

hadoop - 如何从fsimage查找文件名和文件大小?

java - PigServer 在本地文件系统上留下作业 jar

hadoop - 从 ozzie hadoop 运行安装 pig

java - MapReduce和HDFS block 大小