java - java.io.IOException:运行MapReduce程序时无法获得输入拆分

标签 java hadoop cassandra datastax-enterprise

我正在运行MapReduce程序,遇到以下错误。

14/04/22 07:44:02 INFO mapred.JobClient: Cleaning up the staging area cfs://XX.XXX.XXX.XXX/tmp/hadoop-cassandra/mapred/staging/psadmin/.staging/job_201404180932_0063
14/04/22 07:44:02 ERROR security.UserGroupInformation: PriviledgedActionException as:psadmin cause:java.io.IOException: Could not get input splits
Exception in thread "main" java.io.IOException: Could not get input splits
        at org.apache.cassandra.hadoop.AbstractColumnFamilyInputFormat.getSplits(AbstractColumnFamilyInputFormat.java:193)
        at org.apache.hadoop.mapred.JobClient.writeNewSplits(JobClient.java:962)
        at org.apache.hadoop.mapred.JobClient.writeSplits(JobClient.java:979)
        at org.apache.hadoop.mapred.JobClient.access$600(JobClient.java:174)
        at org.apache.hadoop.mapred.JobClient$2.run(JobClient.java:897)
        at org.apache.hadoop.mapred.JobClient$2.run(JobClient.java:850)
        at java.security.AccessController.doPrivileged(Native Method)
        at javax.security.auth.Subject.doAs(Unknown Source)
        at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1121)
        at org.apache.hadoop.mapred.JobClient.submitJobInternal(JobClient.java:850)
        at org.apache.hadoop.mapreduce.Job.submit(Job.java:500)
        at org.apache.hadoop.mapreduce.Job.waitForCompletion(Job.java:530)
        at MultiOutMR.run(MultiOutMR.java:95)
        at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
        at MultiOutMR.main(MultiOutMR.java:36)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at sun.reflect.NativeMethodAccessorImpl.invoke(Unknown Source)
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(Unknown Source)
        at java.lang.reflect.Method.invoke(Unknown Source)
        at org.apache.hadoop.util.RunJar.main(RunJar.java:156)
Caused by: java.util.concurrent.ExecutionException: java.lang.RuntimeException: org.apache.thrift.transport.TTransportException
        at java.util.concurrent.FutureTask.report(Unknown Source)
        at java.util.concurrent.FutureTask.get(Unknown Source)
        at org.apache.cassandra.hadoop.AbstractColumnFamilyInputFormat.getSplits(AbstractColumnFamilyInputFormat.java:189)
        ... 19 more
Caused by: java.lang.RuntimeException: org.apache.thrift.transport.TTransportException
        at org.apache.cassandra.hadoop.AbstractColumnFamilyInputFormat.getSubSplits(AbstractColumnFamilyInputFormat.java:304)
        at org.apache.cassandra.hadoop.AbstractColumnFamilyInputFormat.access$200(AbstractColumnFamilyInputFormat.java:60)
        at org.apache.cassandra.hadoop.AbstractColumnFamilyInputFormat$SplitCallable.call(AbstractColumnFamilyInputFormat.java:226)
        at org.apache.cassandra.hadoop.AbstractColumnFamilyInputFormat$SplitCallable.call(AbstractColumnFamilyInputFormat.java:211)
        at java.util.concurrent.FutureTask.run(Unknown Source)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)
        at java.lang.Thread.run(Unknown Source)
Caused by: org.apache.thrift.transport.TTransportException
        at org.apache.thrift.transport.TIOStreamTransport.read(TIOStreamTransport.java:132)
        at org.apache.thrift.transport.TTransport.readAll(TTransport.java:84)
        at org.apache.thrift.transport.TFramedTransport.readFrame(TFramedTransport.java:129)
        at org.apache.thrift.transport.TFramedTransport.read(TFramedTransport.java:101)
        at org.apache.thrift.transport.TTransport.readAll(TTransport.java:84)
        at org.apache.thrift.protocol.TBinaryProtocol.readAll(TBinaryProtocol.java:378)
        at org.apache.thrift.protocol.TBinaryProtocol.readI32(TBinaryProtocol.java:297)
        at org.apache.thrift.protocol.TBinaryProtocol.readMessageBegin(TBinaryProtocol.java:204)
        at org.apache.thrift.TServiceClient.receiveBase(TServiceClient.java:69)
        at org.apache.cassandra.thrift.Cassandra$Client.recv_describe_splits_ex(Cassandra.java:1359)
        at org.apache.cassandra.thrift.Cassandra$Client.describe_splits_ex(Cassandra.java:1343)
        at org.apache.cassandra.hadoop.AbstractColumnFamilyInputFormat.getSubSplits(AbstractColumnFamilyInputFormat.java:281)
        ... 7 more

注意:-
前提条件:
具有Cassandra 1.2.15.1和Hadoop 1.0.4.9的Datastax Enterprise(DSE 3.2.5)
我们已经配置了一个具有4个节点的数据中心。 nodetool状态显示如下:
XXXXXX@XXXXXXXXX:~$ nodetool status
Datacenter: XXXXXX

Status=Up/Down
|/ State=Normal/Leaving/Joining/Moving
--  Address         Load       Owns   Host ID                               Token                                    Rack
UN  XX.XXX.XXX.XXX  14.65 MB   25.0%  XX.XXX.XXX.XXX vm01
UN  XX.XXX.XXX.XXX  34.25 MB   25.0%  XX.XXX.XXX.XXX vm01
UN  XX.XXX.XXX.XXX  57.45 MB   25.0%  XX.XXX.XXX.XXX vm01
UN  XX.XXX.XXX.XXX  57.08 MB   25.0%  XX.XXX.XXX.XXX vm01

有人可以提供帮助解决此问题吗?提前致谢。

最佳答案

您需要提供有关如何设置hadoop作业的更多信息。更多的是配置问题。 TTransportException更多是服务器内部问题。

关于java - java.io.IOException:运行MapReduce程序时无法获得输入拆分,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/23219225/

相关文章:

java - 将泛型类中的方法约束为特定类型

java - CompareTo涉及非比较字段: how to maintain transitivity?

java - 加载hdfs分区文件列表

Hadoop - 减少阶段的重量

java - "Iterating"通过方法

java - Weblogic keystore 服务异常: Failed to perform cryptographic operation

Hadoop 将 HADOOP_HOME 正确设置为 bin/hadoop 它会给出未找到的命令

database - Cassandra:请求未在 rpc_timeout 内完成

nosql - 使用复合键插入 cassandra 列族

spring-boot - 如何在 cassandra docker 镜像启动或 Spring Boot 应用程序启动时创建键空间