python - Python中的Hadoop流作业失败(失败)

标签 python hadoop hadoop-streaming

因此,我的脚本在运行时可以完美运行:
猫England.txt | ./mapperEngl.py |排序./reducerEngl.py

但是,当我运行时:

/ shared / hadoop / cur / bin / hadoop jar /shared/hadoop/cur/share/hadoop/tools/lib/hadoop-streaming-2.6.0.jar -file /home/hadoop/mapperEngl.py -mapper / home / hadoop / mapperEngl.py -file /home/hadoop/reducerEngl.py -reducer / home / hadoop / reducerEngl.py -input / datadir / England.txt -output /outputdir/climateresults3.txt

我收到以下错误:

16/05/03 09:27:15 WARN streaming.StreamJob: -file option is deprecated, please use generic option -files instead.
16/05/03 09:27:15 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
packageJobJar: [/home/hadoop/mapperEngl.py, /home/hadoop/reducerEngl.py, /tmp/hadoop-unjar6814867016081507297/] [] /tmp/streamjob1585723008278678599.jar tmpDir=null
16/05/03 09:27:16 INFO client.RMProxy: Connecting to ResourceManager at mgmt-florida-poly-eth0/10.200.209.10:8032
16/05/03 09:27:16 INFO client.RMProxy: Connecting to ResourceManager at mgmt-florida-poly-eth0/10.200.209.10:8032
16/05/03 09:27:17 INFO mapred.FileInputFormat: Total input paths to process : 1
16/05/03 09:27:17 INFO mapreduce.JobSubmitter: number of splits:2
16/05/03 09:27:17 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1459438007195_0006
16/05/03 09:27:17 INFO impl.YarnClientImpl: Submitted application application_1459438007195_0006
16/05/03 09:27:17 INFO mapreduce.Job: The url to track the job: http://mgmt-florida-poly-eth0:8088/proxy/application_1459438007195_0006/
16/05/03 09:27:17 INFO mapreduce.Job: Running job: job_1459438007195_0006
16/05/03 09:27:25 INFO mapreduce.Job: Job job_1459438007195_0006 running in uber mode : false
16/05/03 09:27:25 INFO mapreduce.Job:  map 0% reduce 0%
16/05/03 09:27:31 INFO mapreduce.Job:  map 50% reduce 0%
16/05/03 09:27:32 INFO mapreduce.Job:  map 100% reduce 0%
16/05/03 09:27:38 INFO mapreduce.Job: Task Id : attempt_1459438007195_0006_r_000000_0, Status : FAILED
Error: java.lang.RuntimeException: PipeMapRed.waitOutputThreads(): subprocess failed with code 1
        at org.apache.hadoop.streaming.PipeMapRed.waitOutputThreads(PipeMapRed.java:322)
        at org.apache.hadoop.streaming.PipeMapRed.mapRedFinished(PipeMapRed.java:535)
        at org.apache.hadoop.streaming.PipeReducer.close(PipeReducer.java:134)
        at org.apache.hadoop.io.IOUtils.cleanup(IOUtils.java:237)
        at org.apache.hadoop.mapred.ReduceTask.runOldReducer(ReduceTask.java:459)
        at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:392)
        at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:163)
        at java.security.AccessController.doPrivileged(Native Method)
        at javax.security.auth.Subject.doAs(Subject.java:415)
        at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1628)
        at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:158)

16/05/03 09:27:45 INFO mapreduce.Job: Task Id : attempt_1459438007195_0006_r_000000_1, Status : FAILED
Error: java.lang.RuntimeException: PipeMapRed.waitOutputThreads(): subprocess failed with code 1
        at org.apache.hadoop.streaming.PipeMapRed.waitOutputThreads(PipeMapRed.java:322)
        at org.apache.hadoop.streaming.PipeMapRed.mapRedFinished(PipeMapRed.java:535)
        at org.apache.hadoop.streaming.PipeReducer.close(PipeReducer.java:134)
        at org.apache.hadoop.io.IOUtils.cleanup(IOUtils.java:237)
        at org.apache.hadoop.mapred.ReduceTask.runOldReducer(ReduceTask.java:459)
        at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:392)
        at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:163)
        at java.security.AccessController.doPrivileged(Native Method)
        at javax.security.auth.Subject.doAs(Subject.java:415)
        at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1628)
        at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:158)

16/05/03 09:27:51 INFO mapreduce.Job: Task Id : attempt_1459438007195_0006_r_000000_2, Status : FAILED
Error: java.lang.RuntimeException: PipeMapRed.waitOutputThreads(): subprocess failed with code 1
        at org.apache.hadoop.streaming.PipeMapRed.waitOutputThreads(PipeMapRed.java:322)
        at org.apache.hadoop.streaming.PipeMapRed.mapRedFinished(PipeMapRed.java:535)
        at org.apache.hadoop.streaming.PipeReducer.close(PipeReducer.java:134)
        at org.apache.hadoop.io.IOUtils.cleanup(IOUtils.java:237)
        at org.apache.hadoop.mapred.ReduceTask.runOldReducer(ReduceTask.java:459)
        at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:392)
        at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:163)
        at java.security.AccessController.doPrivileged(Native Method)
        at javax.security.auth.Subject.doAs(Subject.java:415)
        at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1628)
        at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:158)

16/05/03 09:27:58 INFO mapreduce.Job:  map 100% reduce 100%
16/05/03 09:27:58 INFO mapreduce.Job: Job job_1459438007195_0006 failed with state FAILED due to: Task failed task_1459438007195_0006_r_000000
Job failed as tasks failed. failedMaps:0 failedReduces:1

16/05/03 09:27:58 INFO mapreduce.Job: Counters: 37
        File System Counters
                FILE: Number of bytes read=0
                FILE: Number of bytes written=228560
                FILE: Number of read operations=0
                FILE: Number of large read operations=0
                FILE: Number of write operations=0
                HDFS: Number of bytes read=29265
                HDFS: Number of bytes written=0
                HDFS: Number of read operations=6
                HDFS: Number of large read operations=0
                HDFS: Number of write operations=0
        Job Counters
                Failed reduce tasks=4
                Launched map tasks=2
                Launched reduce tasks=4
                Rack-local map tasks=2
                Total time spent by all maps in occupied slots (ms)=134880
                Total time spent by all reduces in occupied slots (ms)=242432
                Total time spent by all map tasks (ms)=8430
                Total time spent by all reduce tasks (ms)=15152
                Total vcore-seconds taken by all map tasks=8430
                Total vcore-seconds taken by all reduce tasks=15152
                Total megabyte-seconds taken by all map tasks=17264640
                Total megabyte-seconds taken by all reduce tasks=31031296
        Map-Reduce Framework
                Map input records=107
                Map output records=223
                Map output bytes=9014
                Map output materialized bytes=9472
                Input split bytes=202
                Combine input records=0
                Spilled Records=223
                Failed Shuffles=0
                Merged Map outputs=0
                GC time elapsed (ms)=0
                CPU time spent (ms)=1540
                Physical memory (bytes) snapshot=1305165824
                Virtual memory (bytes) snapshot=5482422272
                Total committed heap usage (bytes)=2022440960
        File Input Format Counters
                Bytes Read=29063
16/05/03 09:27:58 ERROR streaming.StreamJob: Job not successful!
Streaming Command Failed!
[hadoop@mgmt-florida-poly ~]$

我尝试了其他问题的解决方案,但似乎没有用。

是的,只是完全卡在这里。

最佳答案

疯狂,但是我用#!/ usr / bin / python而不是#!/ usr / bin / python3修复了我的问题

我认为我们的Hadoop集群配置存在问题。

关于python - Python中的Hadoop流作业失败(失败),我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/37014958/

相关文章:

python - 在 Hadoop Streaming 作业中写入 Parquet 输出

python - 尝试使用不带 POST 的 Microsoft Graph API 来获取不记名 token

java - 将附加参数传递给 R 中 JDBCDriver 的 dbConnect 函数

python - Jinja2 宏导入 "with context"和全局变量 : {% from file. html import macro_name with context %}

hadoop - hbase.MasterNotRunningException 在 Hbase 中创建表时

hadoop - 从 hdfs 远程检索文件并将其存储在本地节点中

python - hadoop-streaming:当 mapred.reduce.tasks=1 时,reducer 似乎没有运行

hadoop - 如何将数组输入到Map Reduce作业?

python - 主题标签是否会干扰正则表达式中的前瞻?

python - Django 1.5 的多种用户类型、自定义字段和共享身份验证