hadoop - 如何加入并在PIG中寻找值(value)?

标签 hadoop apache-pig

* hwo合并这两个表并检查NDAKOTA地区的ID大于1600 *

1阿拉斯加罗伯特
2波士顿礼来
3 NDakota Michael
4 NDakota威尔
5 NDakota马克

1A 1 09/09/2012 1200
2A 2 2016年8月9日3400
3B 3 4/5/2016 2300

customers = LOAD '/home/vis/Documents/customers' using PigStorage(' ')               AS(cust_id:int,region:chararray,name:chararray);

sales = LOAD '/home/vis/Documents/sales' using PigStorage(' ') 
AS(sales_id:int,cust_id:int,date:datetime,amount:int);

salesNA = FILTER customers BY region =='NDakota';

joined = JOIN sales BY cust_id,salesNA BY cust_id;

grouped = GROUP joined BY cust_id;

summed= FOREACH grouped GENERATE GROUP,SUM(sales.amount);

bigSpenders= FILTER summed BY 1$>1600;

DUMP sorted;

接收错误为

enter image description here

最佳答案

来自Apache Pig文档

Use the disambiguate operator ( :: ) to identify field names after JOIN, COGROUP, CROSS, or FLATTEN operators.



下面的代码片段应足以实现目标,如果您发现任何问题,请告诉我。
customers = LOAD 'customers.txt' using PigStorage(' ')  AS(cust_id:int,region:chararray,name:chararray);
sales = LOAD 'sales.txt' using PigStorage(' ') AS(sales_id:chararray,cust_id:int,date:chararray,amount:int);
custNA = FILTER customers BY region =='NDakota';
joined = JOIN sales BY cust_id,custNA BY cust_id;
req_data = FILTER joined BY amount > 1600;
DUMP req_data;

关于hadoop - 如何加入并在PIG中寻找值(value)?,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/42587011/

相关文章:

sql - 如何在 where 子句中过滤 select 语句中包含 count/distinct/case/when 的列

java - pig 错误 : Could not find or load main class org. apache.pig.Main

linux - 我应该在 MapReduce 模式下启动 Pig 之前启动 Hadoop 集群吗?

mysql - hive中如何计算累计工资

java - 使用我自己的类作为输出值MapReduce Hadoop时,Reducer不会调用reduce方法

hadoop - Pigrc功能可用吗?

hadoop - Hadoop Pig 中的点击率计算

mongodb - 使用 Pig 将 HDFS 数据存储到 MongoDB

hadoop - Storm 的 OutputCollector 是如何工作的?

java - 在 hadoop 中实现多个映射器和单个 reducer