我与定义的模式有两个关系。我希望找到仅从关系 A 中获取关系中不存在的记录(请参阅 this post 上的左侧中间可视化)。
我尝试了下面的两种变体,但没有成功,因为它们都返回下面的错误。如何在 Pig 中执行此类操作?
"ERROR 1200 mismatched input 'WHERE' expecting SEMI-COLON."
join_result = JOIN relationA by (project_id, sequence_id) LEFT OUTER, relationB by (project_id, sequence_id) WHERE relationB (project_id, sequence_id)is null;
join_result = JOIN relationA by (project_id, sequence_id) LEFT OUTER, relationB by (project_id, sequence_id) WHERE (relationB.project_id is null) AND (relationB.sequence_id is null);
最佳答案
JOIN 中没有“WHERE”子句在 PIG 中。您必须使用 FILTER 来根据条件删除记录。
join_result = JOIN relationA by (project_id, sequence_id) LEFT OUTER, relationB by (project_id, sequence_id);
final_result = FILTER join_result BY (relationB.project_id is null AND relationB.sequence_id is null);
关于apache-pig - 如何用 Pig Latin 编写带有 WHERE 子句的左外连接?,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/37274726/