hadoop - Apache Pig - 说明命令错误

标签 hadoop apache-pig high-level

]$ cat webccess.txt
mark,yahoo.com,6
sam,google.com,7
john,yahoo.com,3
patrick,cnn.com,8
mary,facebook.com,1
mark,yahoo.com,4
john,bbc.com,10
andrew,twitter.com,3
patrick,twitter.com,9

我正在 Cloudera Quick Vm Hue-Pig Shell(Grunt) 中运行以下任务

grunt> stage1 = LOAD '/user/cloudera/webaccess.txt' USING PigStorage(',') AS (name:chararray, website:chararray, access:int);
grunt> DUMP stage1;
grunt> stage2 = FILTER stage1 by access >= 8;
grunt> stage3 = GROUP stage1 by name;
grunt> stage4 = FOREACH stage3 GENERATE group as GROUPS, MAX(stage1.access);
grunt> DUMP stage4;

输出:

(sam,7)
(john,10)
(mark,6)
(mary,1)
(andrew,3)
(patrick,9)

至此一切都很好。

当我应用 ILLUSTRATE 命令查看关系阶段 4 时,出现如下所示的错误,

grunt> ILLUSTRATE stage4;

2014-10-07 04:02:43,639 [main] WARN org.apache.hadoop.conf.Configuration - fs.default.name is deprecated. Instead, use fs.defaultFS
2014-10-07 04:02:43,642 [main] INFO org.apache.pig.backend.hadoop.executionengine.HExecutionEngine - Connecting to hadoop file system at: hdfs://localhost.localdomain:8020
2014-10-07 04:02:43,643 [main] WARN org.apache.hadoop.conf.Configuration - io.bytes.per.checksum is deprecated. Instead, use dfs.bytes-per-checksum
2014-10-07 04:02:43,643 [main] WARN org.apache.hadoop.conf.Configuration - dfs.https.address is deprecated. Instead, use dfs.namenode.https-address
2014-10-07 04:02:43,643 [main] INFO org.apache.pig.backend.hadoop.executionengine.HExecutionEngine - Connecting to map-reduce job tracker at: localhost.localdomain:8021
2014-10-07 04:02:43,799 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MRCompiler - File concatenation threshold: 100 optimistic? false
2014-10-07 04:02:43,800 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MultiQueryOptimizer - MR plan size before optimization: 1
2014-10-07 04:02:43,800 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MultiQueryOptimizer - MR plan size after optimization: 1
2014-10-07 04:02:43,804 [main] INFO org.apache.pig.tools.pigstats.ScriptState - Pig script settings are added to the job
2014-10-07 04:02:43,805 [main] ERROR org.apache.pig.pen.ExampleGenerator - Error reading data. Internal error creating job configuration.
java.lang.RuntimeException: Internal error creating job configuration.
at org.apache.pig.pen.ExampleGenerator.getExamples(ExampleGenerator.java:160)
at org.apache.pig.PigServer.getExamples(PigServer.java:1182)
at org.apache.pig.tools.grunt.GruntParser.processIllustrate(GruntParser.java:739)
at org.apache.pig.tools.pigscript.parser.PigScriptParser.Illustrate(PigScriptParser.java:626)
at org.apache.pig.tools.pigscript.parser.PigScriptParser.parse(PigScriptParser.java:323)
at org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:194)
at org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:170)
at org.apache.pig.tools.grunt.Grunt.run(Grunt.java:69)
at org.apache.pig.Main.run(Main.java:538)
at org.apache.pig.Main.main(Main.java:157)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597)
at org.apache.hadoop.util.RunJar.main(RunJar.java:208)
2014-10-07 04:02:43,868 [main] ERROR org.apache.pig.tools.grunt.Grunt - ERROR 2997: Encountered IOException. Exception
Details at logfile: /dev/null

我正处于学习阶段,由于这个错误,我无法转到下一个主题。

同样在开始这个任务之前,当我第一次打开 Hue-Pig Shell(Grunt) 时,我发现了以下警告。

Exception in thread "main" java.lang.NoClassDefFoundError: org/apache/hadoop/util/PlatformName
Caused by: java.lang.ClassNotFoundException: org.apache.hadoop.util.PlatformName
at java.net.URLClassLoader$1.run(URLClassLoader.java:202)
at java.security.AccessController.doPrivileged(Native Method)
at java.net.URLClassLoader.findClass(URLClassLoader.java:190)
at java.lang.ClassLoader.loadClass(ClassLoader.java:306)
at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:301)
at java.lang.ClassLoader.loadClass(ClassLoader.java:247)
Could not find the main class: org.apache.hadoop.util.PlatformName. Program will exit.
which: no hadoop in ((null))
which: no /usr/lib/hadoop/bin/hadoop in ((null))
dirname: missing operand
Try `dirname --help' for more information.
2014-10-07 03:18:27,802 [main] INFO org.apache.pig.Main - Apache Pig version 0.11.0-cdh4.7.0 (rexported) compiled May 28 2014, 11:05:48
2014-10-07 03:18:27,803 [main] INFO org.apache.pig.Main - Logging error messages to: /dev/null
2014-10-07 03:18:28,758 [main] INFO org.apache.pig.impl.util.Utils - Default bootup file /home/cloudera/.pigbootup not found
2014-10-07 03:18:30,436 [main] WARN org.apache.hadoop.conf.Configuration - fs.default.name is deprecated. Instead, use fs.defaultFS
2014-10-07 03:18:30,444 [main] INFO org.apache.pig.backend.hadoop.executionengine.HExecutionEngine - Connecting to hadoop file system at: hdfs://localhost.localdomain:8020
2014-10-07 03:18:37,832 [main] INFO org.apache.pig.backend.hadoop.executionengine.HExecutionEngine - Connecting to map-reduce job tracker at: localhost.localdomain:8021
2014-10-07 03:18:37,842 [main] WARN org.apache.hadoop.conf.Configuration - fs.default.name is deprecated. Instead, use fs.defaultFS

最佳答案

我没有遇到任何问题,说明命令工作正常。能不能先试试本地模式执行?

    $pig -x local
    grunt> stage1 = LOAD 'input.txt' USING PigStorage(',') AS (name:chararray, website:chararray, access:int);
    grunt> stage2 = FILTER stage1 by access >= 8;
    grunt> stage3 = GROUP stage1 by name;
    grunt> stage4 = FOREACH stage3 GENERATE group as GROUPS, MAX(stage1.access);
    grunt> DUMP stage4;
    (sam,7)
    (john,10)
    (mark,6)
    (mary,1)
    (andrew,3)
    (patrick,9)
    grunt> ILLUSTRATE stage4;
    ----------------------------------------------------------------------------
    | stage1     | name:chararray     | website:chararray     | access:int     | 
    ----------------------------------------------------------------------------
    |            | john               | yahoo.com             | 3              | 
    |            | john               | bbc.com               | 10             | 
    ----------------------------------------------------------------------------
    --------------------------------------------------------------------------------------------------------------------------
    | stage3     | group:chararray     | stage1:bag{:tuple(name:chararray,website:chararray,access:int)}                     | 
    --------------------------------------------------------------------------------------------------------------------------
    |            | john                | {(john, yahoo.com, 3), (john, bbc.com, 10)}                                         | 
    |            | john                | {(john, yahoo.com, 3), (john, bbc.com, 10)}                                         | 
    --------------------------------------------------------------------------------------------------------------------------
    ------------------------------------------------
    | stage4     | GROUPS:chararray     | :int     | 
    ------------------------------------------------
    |            | john                 | 10       | 
    ------------------------------------------------

关于hadoop - Apache Pig - 说明命令错误,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/26234833/

相关文章:

hadoop - HADOOP PIG-使用PIG获取记录子集的最大值和最小值

node.js - 是否有比我进行高级测试更好的选择?

c - 编译成c的高级系统语言?

java - 无法使用 Java 连接到 Hbase

java - 我如何在hadoop中为其各自的输入文件生成不同的输出文件

hadoop - yarn : How to run MapReduce jobs with lot of mappers comparing to cluster size

testing - 如何测试Hadoop mapreduce

group-by - 连接 pig 的每个字段?

hadoop - 如何在 Pig 中进行分组时消除标识符

fortran - 调用 Fortran 子例程的最佳高级语言是什么?