hadoop - Hadoop字数统计失败,文件很大

标签 hadoop

全部

我正在尝试在Hadoop中使用大小为43 GB的文件运行wordcount作业。以前,我用一个小文件测试了系统,并成功完成了工作。但是,对于较大的文件,我反复遇到此错误:

14/07/28 15:50:12 INFO mapreduce.Job: Running job: job_1406562550988_0001
14/07/28 15:50:31 INFO mapreduce.Job: Job job_1406562550988_0001 running in uber mode : false
14/07/28 15:50:31 INFO mapreduce.Job:  map 0% reduce 0%
14/07/28 15:51:44 INFO ipc.Client: Retrying connect to server: master/192.168.50.2:46671. Already tried 0 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=3, sleepTime=1 SECONDS)
14/07/28 15:51:45 INFO ipc.Client: Retrying connect to server: master/192.168.50.2:46671. Already tried 1 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=3, sleepTime=1 SECONDS)
14/07/28 15:51:46 INFO ipc.Client: Retrying connect to server: master/192.168.50.2:46671. Already tried 2 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=3, sleepTime=1 SECONDS)
14/07/28 15:52:06 INFO mapreduce.Job: Job job_1406562550988_0001 failed with state FAILED due to: Application application_1406562550988_0001 failed 2 times due to AM Container for appattempt_1406562550988_0001_000002 exited with  exitCode: 1 due to: Exception from container-launch: 
org.apache.hadoop.util.Shell$ExitCodeException: 
    at org.apache.hadoop.util.Shell.runCommand(Shell.java:464)
    at org.apache.hadoop.util.Shell.run(Shell.java:379)
    at org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:589)
    at org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor.launchContainer(DefaultContainerExecutor.java:195)
    at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:283)
    at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:79)
    at java.util.concurrent.FutureTask.run(FutureTask.java:262)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
    at java.lang.Thread.run(Thread.java:744)


.Failing this attempt.. Failing the application.
14/07/28 15:52:06 INFO mapreduce.Job: Counters: 0

我的core-site.xml文件如下:
<configuration>
        <property>
                <name>fs.default.name</name>
                <value>hdfs://master:54310</value>
        </property>
</configuration>

hdfs-site.xml:
<configuration>
        <property>
                <name>dfs.replication</name>
                <value>3</value>
        </property>
        <property>
                <name>dfs.namenode.name.dir</name>
                <value>file:/home/ubuntu/hadoop-store/namenode</value>
        </property>
        <property>
                <name>dfs.datanode.data.dir</name>
                <value>file:/mnt/datanode</value>
        </property>
</configuration>

mapred-site.xml:
<configuration>
        <property>
                <name>mapreduce.framework.name</name>
                <value>yarn</value>
        </property>
        <property>
                <name>mapred.job.tracker</name>
                <value>master:54311</value>
        </property>
</configuration>

yarn-site.xml:
<configuration>

<!-- Site specific YARN configuration properties -->
        <property>
                <name>yarn.nodemanager.aux-services</name>
                <value>mapreduce_shuffle</value>
        </property>
        <property>
                <name>yarn.nodemanager.aux-services.mapreduce.shuffle.class</name>
                <value>org.apache.hadoop.mapred.ShuffleHandler</value>
        </property>

</configuration>

我不知道该怎么办。任何帮助将不胜感激。提前致谢!

我编辑了yarn-site.xml文件,如下所示:
<configuration>

<!-- Site specific YARN configuration properties -->
        <property>
                <name>yarn.nodemanager.aux-services</name>
                <value>mapreduce_shuffle</value>
        </property>
        <property>
                <name>yarn.nodemanager.aux-services.mapreduce.shuffle.class</name>
                <value>org.apache.hadoop.mapred.ShuffleHandler</value>
        </property>
        <property>
                <name>yarn.application.classpath</name>
                <value>
                        $HADOOP_CONF_DIR,
                        $HADOOP_INSTALL/share/hadoop/common/*,
                        $HADOOP_INSTALL/share/hadoop/common/lib/*,
                        $HADOOP_INSTALL/share/hadoop/hdfs/*,
                        $HADOOP_INSTALL/share/hadoop/hdfs/lib/*,
                        $HADOOP_INSTALL/share/hadoop/mapreduce/*,
                        $HADOOP_INSTALL/share/hadoop/mapreduce/lib/*,
                        $HADOOP_INSTALL/share/hadoop/yarn/*,
                        $HADOOP_INSTALL/share/hadoop/yarn/lib/*
                </value>
        </property>
</configuration>

我的stderr文件现在显示:
Exception in thread "main" java.lang.NoClassDefFoundError: org/apache/hadoop/service/CompositeService
    at java.lang.ClassLoader.defineClass1(Native Method)
    at java.lang.ClassLoader.defineClass(ClassLoader.java:800)
    at java.security.SecureClassLoader.defineClass(SecureClassLoader.java:142)
    at java.net.URLClassLoader.defineClass(URLClassLoader.java:449)
    at java.net.URLClassLoader.access$100(URLClassLoader.java:71)
    at java.net.URLClassLoader$1.run(URLClassLoader.java:361)
    at java.net.URLClassLoader$1.run(URLClassLoader.java:355)
    at java.security.AccessController.doPrivileged(Native Method)
    at java.net.URLClassLoader.findClass(URLClassLoader.java:354)
    at java.lang.ClassLoader.loadClass(ClassLoader.java:425)
    at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:308)
    at java.lang.ClassLoader.loadClass(ClassLoader.java:358)
    at sun.launcher.LauncherHelper.checkAndLoadMain(LauncherHelper.java:482)
Caused by: java.lang.ClassNotFoundException: org.apache.hadoop.service.CompositeService
    at java.net.URLClassLoader$1.run(URLClassLoader.java:366)
    at java.net.URLClassLoader$1.run(URLClassLoader.java:355)
    at java.security.AccessController.doPrivileged(Native Method)
    at java.net.URLClassLoader.findClass(URLClassLoader.java:354)
    at java.lang.ClassLoader.loadClass(ClassLoader.java:425)
    at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:308)
    at java.lang.ClassLoader.loadClass(ClassLoader.java:358)
    ... 13 more

最佳答案

似乎hadoop jar 不可见。请检查$ HADOOP_CONF_DIR,$ HADOOP_HOME,$ HADOOP_INSTALL和$ YARN_HOME环境变量。或在yarn-site.xml中指定直接的lib路径。

关于hadoop - Hadoop字数统计失败,文件很大,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/25012253/

相关文章:

apache-spark - Kerberos Cloudera Hadoop 的 livy curl 请求错误

apache-spark - 通过 Spark 加载的表在 Hive 中无法访问

java - 对webhdfs的Http请求但给出FileNotFoundException

java - 创建外部表配置单元,位置内部包含多个文件

hadoop - RDD 到 HDFS - 身份验证错误 - RetryInvocationHandler

java - Hadoop GenericOptionsParser

hadoop - Sqoop - 绑定(bind)到 YARN 队列

hadoop - 从RDBMS批量导入到Hadoop

performance - Apache Spark 分布式环境调优

hadoop - 解析路径字符串以使用Hive查找所有祖先