hadoop - Pig "Max"命令用于 pig-0.12.1 和 pig-0.13.0 与 Hadoop-2.4.0

标签 hadoop apache-pig

我有一个从 Hortonworks 获得的 pig 脚本,它与 pig-0.9.2.15 和 Hadoop-1.0.3.16 一起工作得很好。但是当我在 Hadoop-2.4.0 上使用 pig-0.12.1(使用 -Dhadoopversion=23 重新编译)或 pig-0.13.0 运行它时,它将无法工作。

似乎以下行是问题所在。

max_runs = FOREACH grp_data GENERATE group as grp, MAX(runs.runs) as max_runs;

这是整个脚本。
batting = load 'pig_data/Batting.csv' using PigStorage(',');
runs = FOREACH batting GENERATE $0 as playerID, $1 as year, $8 as runs;
grp_data = GROUP runs by (year);
max_runs = FOREACH grp_data GENERATE group as grp, MAX(runs.runs) as max_runs;
join_max_run = JOIN max_runs by ($0, max_runs), runs by (year,runs);
join_data = FOREACH join_max_run GENERATE $0 as year, $2 as playerID, $1 as runs;
STORE join_data INTO './join_data';

这是hadoop错误信息:

2014-07-29 18:03:02,957 [main] ERROR org.apache.pig.tools.pigstats.PigStats - ERROR 0: org.apache.pig.backend.executionengine.ExecException: ERROR 0: Exception while executing (Name: grp_data: Local Rearrange[tuple]{bytearray}(false) - scope-34 Operator Key: scope-34): org.apache.pig.backend.executionengine.ExecException: ERROR 2106: Error executing an algebraic function 2014-07-29 18:03:02,958 [main] ERROR org.apache.pig.tools.pigstats.mapreduce.MRPigStatsUtil - 1 map reduce job(s) failed!



如果我仍然想使用“MAX”功能,我该如何解决这个问题?谢谢!

这是完整的信息:

14/07/29 17:50:11 INFO pig.ExecTypeProvider: Trying ExecType : LOCAL 14/07/29 17:50:11 INFO pig.ExecTypeProvider: Trying ExecType : MAPREDUCE 14/07/29 17:50:11 INFO pig.ExecTypeProvider: Picked MAPREDUCE as the ExecType 2014-07-29 17:50:12,104 [main] INFO org.apache.pig.Main - Apache Pig version 0.13.0 (r1606446) compiled Jun 29 2014, 02:27:58 2014-07-29 17:50:12,104 [main] INFO org.apache.pig.Main - Logging error messages to: /root/hadooptestingsuite/scripts/tests/pig_test/hadoop2/pig_1406677812103.log 2014-07-29 17:50:13,050 [main] INFO org.apache.pig.impl.util.Utils - Default bootup file /root/.pigbootup not found 2014-07-29 17:50:13,415 [main] INFO org.apache.hadoop.conf.Configuration.deprecation - mapred.job.tracker is deprecated. Instead, use mapreduce.jobtracker.address 2014-07-29 17:50:13,415 [main] INFO org.apache.hadoop.conf.Configuration.deprecation - fs.default.name is deprecated. Instead, use fs.defaultFS 2014-07-29 17:50:13,415 [main] INFO org.apache.pig.backend.hadoop.executionengine.HExecutionEngine - Connecting to hadoop file system at: hdfs://namenode.cmda.hadoop.com:8020 2014-07-29 17:50:14,302 [main] INFO org.apache.pig.backend.hadoop.executionengine.HExecutionEngine - Connecting to map-reduce job tracker at: namenode.cmda.hadoop.com:8021 2014-07-29 17:50:14,990 [main] INFO org.apache.hadoop.conf.Configuration.deprecation - fs.default.name is deprecated. Instead, use fs.defaultFS 2014-07-29 17:50:15,570 [main] INFO org.apache.hadoop.conf.Configuration.deprecation - fs.default.name is deprecated. Instead, use fs.defaultFS 2014-07-29 17:50:15,665 [main] WARN org.apache.pig.newplan.BaseOperatorPlan - Encountered Warning IMPLICIT_CAST_TO_DOUBLE 1 time(s). 2014-07-29 17:50:15,705 [main] INFO org.apache.hadoop.conf.Configuration.deprecation - mapred.textoutputformat.separator is deprecated. Instead, use mapreduce.output.textoutputformat.separator 2014-07-29 17:50:15,791 [main] INFO org.apache.pig.tools.pigstats.ScriptState - Pig features used in the script: HASH_JOIN,GROUP_BY 2014-07-29 17:50:15,873 [main] INFO org.apache.pig.newplan.logical.optimizer.LogicalPlanOptimizer - {RULES_ENABLED=[AddForEach, ColumnMapKeyPrune, GroupByConstParallelSetter, LimitOptimizer, LoadTypeCastInserter, MergeFilter, MergeForEach, PartitionFilterOptimizer, PushDownForEachFlatten, PushUpFilter, SplitFilter, StreamTypeCastInserter], RULES_DISABLED=[FilterLogicExpressionSimplifier]} 2014-07-29 17:50:16,319 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MRCompiler - File concatenation threshold: 100 optimistic? false 2014-07-29 17:50:16,377 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.CombinerOptimizer - Choosing to move algebraic foreach to combiner 2014-07-29 17:50:16,410 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MRCompiler$LastInputStreamingOptimizer - Rewrite: POPackage->POForEach to POPackage(JoinPackager) 2014-07-29 17:50:16,417 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MultiQueryOptimizer - MR plan size before optimization: 3 2014-07-29 17:50:16,418 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MultiQueryOptimizer - Merged 1 map-reduce splittees. 2014-07-29 17:50:16,418 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MultiQueryOptimizer - Merged 1 out of total 3 MR operators. 2014-07-29 17:50:16,418 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MultiQueryOptimizer - MR plan size after optimization: 2 2014-07-29 17:50:16,493 [main] INFO org.apache.hadoop.conf.Configuration.deprecation - fs.default.name is deprecated. Instead, use fs.defaultFS 2014-07-29 17:50:16,575 [main] INFO org.apache.hadoop.yarn.client.RMProxy - Connecting to ResourceManager at namenode.cmda.hadoop.com/10.0.3.1:8050 2014-07-29 17:50:16,973 [main] INFO org.apache.pig.tools.pigstats.mapreduce.MRScriptState - Pig script settings are added to the job 2014-07-29 17:50:17,007 [main] INFO org.apache.hadoop.conf.Configuration.deprecation - mapred.job.reduce.markreset.buffer.percent is deprecated. Instead, use mapreduce.reduce.markreset.buffer.percent 2014-07-29 17:50:17,007 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - mapred.job.reduce.markreset.buffer.percent is not set, set to default 0.3 2014-07-29 17:50:17,007 [main] INFO org.apache.hadoop.conf.Configuration.deprecation - mapred.output.compress is deprecated. Instead, use mapreduce.output.fileoutputformat.compress 2014-07-29 17:50:17,020 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - Reduce phase detected, estimating # of required reducers. 2014-07-29 17:50:17,020 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - Using reducer estimator: org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.InputSizeReducerEstimator 2014-07-29 17:50:17,064 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.InputSizeReducerEstimator - BytesPerReducer=1000000000 maxReducers=999 totalInputFileSize=6398990 2014-07-29 17:50:17,067 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - Setting Parallelism to 1 2014-07-29 17:50:17,067 [main] INFO org.apache.hadoop.conf.Configuration.deprecation - mapred.reduce.tasks is deprecated. Instead, use mapreduce.job.reduces 2014-07-29 17:50:17,068 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - This job cannot be converted run in-process 2014-07-29 17:50:17,068 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - creating jar file Job2337803902169382273.jar 2014-07-29 17:50:20,957 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - jar file Job2337803902169382273.jar created 2014-07-29 17:50:20,957 [main] INFO org.apache.hadoop.conf.Configuration.deprecation - mapred.jar is deprecated. Instead, use mapreduce.job.jar 2014-07-29 17:50:21,001 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - Setting up multi store job 2014-07-29 17:50:21,036 [main] INFO org.apache.pig.data.SchemaTupleFrontend - Key [pig.schematuple] is false, will not generate code. 2014-07-29 17:50:21,036 [main] INFO org.apache.pig.data.SchemaTupleFrontend - Starting process to move generated code to distributed cacche 2014-07-29 17:50:21,046 [main] INFO org.apache.pig.data.SchemaTupleFrontend - Setting key [pig.schematuple.classes] with classes to deserialize [] 2014-07-29 17:50:21,310 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - 1 map-reduce job(s) waiting for submission. 2014-07-29 17:50:21,311 [main] INFO org.apache.hadoop.conf.Configuration.deprecation - mapred.job.tracker.http.address is deprecated. Instead, use mapreduce.jobtracker.http.address 2014-07-29 17:50:21,332 [JobControl] INFO org.apache.hadoop.yarn.client.RMProxy - Connecting to ResourceManager at namenode.cmda.hadoop.com/10.0.3.1:8050 2014-07-29 17:50:21,366 [JobControl] INFO org.apache.hadoop.conf.Configuration.deprecation - fs.default.name is deprecated. Instead, use fs.defaultFS 2014-07-29 17:50:22,606 [JobControl] INFO org.apache.hadoop.mapreduce.lib.input.FileInputFormat - Total input paths to process : 1 2014-07-29 17:50:22,606 [JobControl] INFO org.apache.pig.backend.hadoop.executionengine.util.MapRedUtil - Total input paths to process : 1 2014-07-29 17:50:22,629 [JobControl] INFO org.apache.pig.backend.hadoop.executionengine.util.MapRedUtil - Total input paths (combined) to process : 1 2014-07-29 17:50:22,729 [JobControl] INFO org.apache.hadoop.mapreduce.JobSubmitter - number of splits:1 2014-07-29 17:50:22,745 [JobControl] INFO org.apache.hadoop.conf.Configuration.deprecation - fs.default.name is deprecated. Instead, use fs.defaultFS 2014-07-29 17:50:23,026 [JobControl] INFO org.apache.hadoop.mapreduce.JobSubmitter - Submitting tokens for job: job_1406677482986_0003 2014-07-29 17:50:23,258 [JobControl] INFO org.apache.hadoop.yarn.client.api.impl.YarnClientImpl - Submitted application application_1406677482986_0003 2014-07-29 17:50:23,340 [JobControl] INFO org.apache.hadoop.mapreduce.Job - The url to track the job: http://namenode.cmda.hadoop.com:8088/proxy/application_1406677482986_0003/ 2014-07-29 17:50:23,340 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - HadoopJobId: job_1406677482986_0003 2014-07-29 17:50:23,340 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - Processing aliases batting,grp_data,max_runs,runs 2014-07-29 17:50:23,340 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - detailed locations: M: batting[3,10],runs[5,7],max_runs[7,11],grp_data[6,11] C: max_runs[7,11],grp_data[6,11] R: max_runs[7,11] 2014-07-29 17:50:23,340 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - More information at: http://namenode.cmda.hadoop.com:50030/jobdetails.jsp?jobid=job_1406677482986_0003 2014-07-29 17:50:23,357 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - 0% complete 2014-07-29 17:50:23,357 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - Running jobs are [job_1406677482986_0003] 2014-07-29 17:51:15,564 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - 50% complete 2014-07-29 17:51:15,564 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - Running jobs are [job_1406677482986_0003] 2014-07-29 17:51:18,582 [main] WARN org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - Ooops! Some job has failed! Specify -stop_on_failure if you want Pig to stop immediately on failure. 2014-07-29 17:51:18,582 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - job job_1406677482986_0003 has failed! Stop running all dependent jobs 2014-07-29 17:51:18,582 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - 100% complete 2014-07-29 17:51:18,825 [main] ERROR org.apache.pig.tools.pigstats.PigStats - ERROR 0: org.apache.pig.backend.executionengine.ExecException: ERROR 0: Exception while executing (Name: grp_data: Local Rearrange[tuple]{bytearray}(false) - scope-73 Operator Key: scope-73): org.apache.pig.backend.executionengine.ExecException: ERROR 2106: Error executing an algebraic function 2014-07-29 17:51:18,825 [main] ERROR org.apache.pig.tools.pigstats.mapreduce.MRPigStatsUtil - 1 map reduce job(s) failed! 2014-07-29 17:51:18,826 [main] INFO org.apache.pig.tools.pigstats.mapreduce.SimplePigStats - Script Statistics:

HadoopVersion PigVersion UserId StartedAt FinishedAt Features 2.4.0 0.13.0 root 2014-07-29 17:50:16 2014-07-29 17:51:18 HASH_JOIN,GROUP_BY

Failed!

Failed Jobs: JobId Alias Feature Message Outputs job_1406677482986_0003 batting,grp_data,max_runs,runs MULTI_QUERY,COMBINER Message: Job failed!

Input(s): Failed to read data from "hdfs://namenode.cmda.hadoop.com:8020/user/root/pig_data/Batting.csv"

Output(s):

Counters: Total records written : 0 Total bytes written : 0 Spillable Memory Manager spill count : 0 Total bags proactively spilled: 0 Total records proactively spilled: 0

Job DAG: job_1406677482986_0003 -> null, null

2014-07-29 17:51:18,826 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - Failed! 2014-07-29 17:51:18,827 [main] ERROR org.apache.pig.tools.grunt.GruntParser - ERROR 2106: Error executing an algebraic function Details at logfile: /root/hadooptestingsuite/scripts/tests/pig_test/hadoop2/pig_1406677812103.log 2014-07-29 17:51:18,828 [main] ERROR org.apache.pig.tools.grunt.GruntParser - ERROR 2244: Job scope-58 failed, hadoop does not return any error message Details at logfile: /root/hadooptestingsuite/scripts/tests/pig_test/hadoop2/pig_1406677812103.log

最佳答案

尝试通过强制转换 MAX 函数

max_runs = FOREACH grp_data GENERATE group as grp, (int)MAX(runs.runs) as max_runs;

希望它会工作

关于hadoop - Pig "Max"命令用于 pig-0.12.1 和 pig-0.13.0 与 Hadoop-2.4.0,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/25026992/

相关文章:

hadoop - 我可以在同一台 Linux 机器上有两个 sudo 用户吗?

hadoop - 错误2245:无法从loadFunc org.apache.hive.hcatalog.pig.HCatLoader获取架构

python - 使用pydoop将文件复制到hdfs

azure - 如何将较大的文件上传到 azure hadoop 集群?

amazon-web-services - 无法以正确的形式在Hive表中加载数据

hadoop - 在 Hadoop 2.2 上安装 oozie

hadoop - Pig 的 UDF 中存在 "in"会导致问题

java - Oozie Java API Kerberos 身份验证

hadoop - 当文件添加到指向目录时,外部配置单元表是否会自行刷新

hadoop - 无法在我的新 Hadoop 安装中加载 Pig