Hadoop、Mapreduce - 无法获取 locatedblock 的 block 长度

标签 hadoop mapreduce hdfs

我在 hdfs 上有一个文件,路径为“test/test.txt”,大小为 1.3G

ls 和 du 命令的输出是:

hadoop fs -du test/test.txt -> 1379081672 test/test.txt

hadoop fs -ls test/test.txt ->

Found 1 items
-rw-r--r--   3 testuser supergroup 1379081672 2014-05-06 20:27 test/test.txt

我想对此文件运行 MapReduce 作业,但是当我对此文件启动 MapReduce 作业时,作业失败并出现以下错误:

hadoop jar myjar.jar test.TestMapReduceDriver 测试输出

14/05/29 16:42:03 WARN mapred.JobClient: Use GenericOptionsParser for parsing the 
arguments. Applications should implement Tool for the same.
14/05/29 16:42:03 INFO input.FileInputFormat: Total input paths to process : 1
14/05/29 16:42:03 INFO mapred.JobClient: Running job: job_201405271131_9661
14/05/29 16:42:04 INFO mapred.JobClient:  map 0% reduce 0%
14/05/29 16:42:17 INFO mapred.JobClient: Task Id : attempt_201405271131_9661_m_000004_0, Status : FAILED
java.io.IOException: Cannot obtain block length for LocatedBlock{BP-428948818-namenode-1392736828725:blk_-6790192659948575136_8493225; getBlockSize()=36904392; corrupt=false; offset=1342177280; locs=[datanode4:50010, datanode3:50010, datanode1:50010]}
at org.apache.hadoop.hdfs.DFSInputStream.readBlockLength(DFSInputStream.java:319)
at org.apache.hadoop.hdfs.DFSInputStream.fetchLocatedBlocksAndGetLastBlockLength(DFSInputStream.java:263)
at org.apache.hadoop.hdfs.DFSInputStream.openInfo(DFSInputStream.java:205)
at org.apache.hadoop.hdfs.DFSInputStream.<init>(DFSInputStream.java:198)
at org.apache.hadoop.hdfs.DFSClient.open(DFSClient.java:1117)
at org.apache.hadoop.hdfs.DistributedFileSystem.open(DistributedFileSystem.java:249)
at org.apache.hadoop.hdfs.DistributedFileSystem.open(DistributedFileSystem.java:82)
at org.apache.hadoop.fs.FileSystem.open(FileSystem.java:746)
at org.apache.hadoop.mapreduce.lib.input.LineRecordReader.initialize(LineRecordReader.java:83)
at org.apache.hadoop.mapred.Ma`

我尝试了以下命令:

hadoop fs -cat test/test.txt 给出以下错误

cat: Cannot obtain block length for LocatedBlock{BP-428948818-10.17.56.16-1392736828725:blk_-6790192659948575136_8493225; getBlockSize()=36904392; corrupt=false; offset=1342177280; locs=[datanode3:50010, datanode1:50010, datanode4:50010]}

此外,我无法复制文件hadoop fs -cp test/test.txt tmp 给出相同的错误:

cp: Cannot obtain block length for LocatedBlock{BP-428948818-10.17.56.16-1392736828725:blk_-6790192659948575136_8493225; getBlockSize()=36904392; corrupt=false; offset=1342177280; locs=[datanode1:50010, datanode3:50010, datanode4:50010]}

hdfs fsck/user/testuser/test/test.txt 命令的输出:

Connecting to namenode via `http://namenode:50070`
FSCK started by testuser (auth:SIMPLE) from /10.17.56.16 for path 
/user/testuser/test/test.txt at Thu May 29 17:00:44 EEST 2014
Status: HEALTHY
Total size: 0 B (Total open files size: 1379081672 B)
Total dirs: 0
Total files:    0 (Files currently being written: 1)
Total blocks (validated):   0 (Total open file blocks (not validated): 21)
Minimally replicated blocks:    0
Over-replicated blocks: 0
Under-replicated blocks:    0
Mis-replicated blocks:      0
Default replication factor: 3
Average block replication:  0.0
Corrupt blocks:     0
Missing replicas:       0
Number of data-nodes:       5
Number of racks:        1
FSCK ended at Thu May 29 17:00:44 EEST 2014 in 0 milliseconds
The filesystem under path /user/testuser/test/test.txt is HEALTHY

顺便说一句,我可以从网络浏览器中看到 test.txt 文件的内容。

hadoop版本是:Hadoop 2.0.0-cdh4.5.0

最佳答案

我和你遇到了同样的问题,我通过以下步骤修复了它。 有一些文件被水槽打开但从未关闭(我不确定你的原因)。 您需要通过以下命令查找打开的文件的名称:

hdfs fsck /directory/of/locked/files/ -files -openforwrite

您可以尝试使用以下命令恢复文件:

hdfs debug recoverLease -path <path-of-the-file> -retries 3 

或者通过命令删除它们:

hdfs dfs -rmr <path-of-the-file>

关于Hadoop、Mapreduce - 无法获取 locatedblock 的 block 长度,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/23936517/

相关文章:

hadoop - 在接收器失败后,我如何强制 Flume-NG 处理积压的事件?

Hadoop机器学习/数据挖掘项目构想?

apache - Oozie: Action 通知邮件需要设置为高优先级

java - 容器启动 exitCode=1 hadoop 出现异常

java - Hadoop- CDH5权限

一个小文件的hadoop并行任务

hadoop - 如何使用MR代码处理hadoop中的单个文件

java - 运行基本 Hadoop 代码时出错

hadoop - 将目录从一个Hadoop集群递归复制到另一个

hadoop - 如何在不使用 Hadoop 的情况下读取 HDFS 上的 Snappy 压缩文件?