perl - 为什么我的 reducer 出现故障? (Hadoop)

标签 perl apache hadoop mapreduce

因此,我编写了两个Perl脚本来练习Map Reduce。该程序应该计算我放在目录中的一堆文本文件中的所有单词。

这是我的mapper.pl

#!/usr/bin/perl

use 5.010;
use strict;
use warnings;

while(my $line = <>) {
    my @words = split(' ', $line);

    foreach my $word(@words) {
        print "$word \t 1\n";
    }
}

这是我的reducer.pl
#!/bin/usr/perl

use 5.010;
use warnings;

my $currentWord = "";
my $currentCount = 0;

##Use this block for testing the reduce script with some test data.
#Open the test file
#open(my $fh, "<", "testdata.txt");
#while(!eof $fh) {}

while(my $line = <>) {
    #Remove the \n
    chomp $line;

    #Index 0 is the word, index 1 is the count value
    my @lineData = split('\t', $line);
    my $word = $lineData[0];
    my $count = $lineData[1];

    if($currentWord eq $word) {
        $currentCount = $currentCount + $count;
    } else {
        if($currentWord ne "") {
            #Output the key we're finished working with
            print "$currentWord \t $currentCount \n";
        }
        #Switch the current variables over to the next key
        $currentCount = $count;
        $currentWord = $word;
    }
}

#deal with the last loop 
print "$currentWord \t $currentCount \n";

因此,当我使用hadoop流命令运行这些命令时:
bin/hadoop jar contrib/streaming/hadoop-streaming-1.1.2.jar -file /home/hduser/countWords/mapper.pl -mapper /home/hduser/countWords/mapper.pl -file /home/hduser/countWords/reducer.pl -reducer /home/hduser/countWords/reducer.pl -input /user/hduser/testData/* -output /user/hduser/testData/output/*

我收到以下错误:
13/07/19 11:36:33 INFO streaming.StreamJob:  map 0%  reduce 0%
13/07/19 11:36:39 INFO streaming.StreamJob:  map 9%  reduce 0%
13/07/19 11:36:40 INFO streaming.StreamJob:  map 64%  reduce 0%
13/07/19 11:36:41 INFO streaming.StreamJob:  map 73%  reduce 0%
13/07/19 11:36:44 INFO streaming.StreamJob:  map 82%  reduce 0%
13/07/19 11:36:45 INFO streaming.StreamJob:  map 100%  reduce 0%
13/07/19 11:36:49 INFO streaming.StreamJob:  map 100%  reduce 11%
13/07/19 11:36:53 INFO streaming.StreamJob:  map 100%  reduce 0%
13/07/19 11:37:02 INFO streaming.StreamJob:  map 100%  reduce 17%
13/07/19 11:37:03 INFO streaming.StreamJob:  map 100%  reduce 33%
13/07/19 11:37:06 INFO streaming.StreamJob:  map 100%  reduce 17%
13/07/19 11:37:08 INFO streaming.StreamJob:  map 100%  reduce 0%
13/07/19 11:37:16 INFO streaming.StreamJob:  map 100%  reduce 33%
13/07/19 11:37:21 INFO streaming.StreamJob:  map 100%  reduce 0%
13/07/19 11:37:31 INFO streaming.StreamJob:  map 100%  reduce 33%
13/07/19 11:37:35 INFO streaming.StreamJob:  map 100%  reduce 17%
13/07/19 11:37:38 INFO streaming.StreamJob:  map 100%  reduce 100%
13/07/19 11:37:38 INFO streaming.StreamJob: To kill this job, run:
13/07/19 11:37:38 INFO streaming.StreamJob: /usr/local/hadoop/libexec/../bin/hadoop job  -Dmapred.job.tracker=shiv0:54311 -kill job_201307031312_0065
13/07/19 11:37:38 INFO streaming.StreamJob: Tracking URL: http://shiv0:50030/jobdetails.jsp?jobid=job_201307031312_0065
13/07/19 11:37:38 ERROR streaming.StreamJob: Job not successful. Error: # of failed Reduce Tasks exceeded allowed limit. FailedCount: 1. LastFailedTask: task_201307031312_0065_r_000001
13/07/19 11:37:38 INFO streaming.StreamJob: killJob... Streaming Command Failed!

我一直试图弄清楚我现在做错了什么,而我一直在挠头。任何人对我如何可以诊断出任何建议?

最佳答案

bin / hadoop jar contrib / streaming / hadoop-streaming-1.1.2.jar -file /home/hduser/countWords/mapper.py -mapper /home/hduser/countWords/mapper.py -file / home / hduser / countWords / reducer.py -reducer /home/hduser/countWords/reducer.py -input / user / hduser / testData / * -output / user / hduser / testData / output / *

为什么要调用.py文件?您不应该调用perl文件,即reducer.pl而不是reducer.py

关于perl - 为什么我的 reducer 出现故障? (Hadoop),我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/17750393/

相关文章:

ruby - 另一个与 "diff.*\n.*\n.*\n.*\n.*\n "相同的正则表达式

hadoop - 在特定队列上运行 sqoop 作业

hadoop - 在 EMR 中访问 S3 中的数据

Hadoop,套接字超时错误

perl - 无法使用 mojolicious 以 xml 形式发送 HTTP 响应

perl - 我的 Perl 脚本如何从托管系统接收 SNMP 陷阱?

python - 如何调试配置 django 以使用 apache 和 mod-wsgi 服务的基本问题?

apache - 如何在使用 apache mod_proxy 时保持不同的 session

javascript - 为什么Apache服务器每天都会自动启动并且它会停止nodeJs?

arrays - 有没有一种方法可以根据数组的值而不是元素编号用数组中的两个新元素替换数组的元素?