我在 hdfs 上有一个文件,路径为“test/test.txt”,大小为 1.3G
ls 和 du 命令的输出是:
hadoop fs -du test/test.txt
-> 1379081672 test/test.txt
hadoop fs -ls test/test.txt
->
Found 1 items
-rw-r--r-- 3 testuser supergroup 1379081672 2014-05-06 20:27 test/test.txt
我想对此文件运行 MapReduce 作业,但是当我对此文件启动 MapReduce 作业时,作业失败并出现以下错误:
hadoop jar myjar.jar test.TestMapReduceDriver 测试输出
14/05/29 16:42:03 WARN mapred.JobClient: Use GenericOptionsParser for parsing the
arguments. Applications should implement Tool for the same.
14/05/29 16:42:03 INFO input.FileInputFormat: Total input paths to process : 1
14/05/29 16:42:03 INFO mapred.JobClient: Running job: job_201405271131_9661
14/05/29 16:42:04 INFO mapred.JobClient: map 0% reduce 0%
14/05/29 16:42:17 INFO mapred.JobClient: Task Id : attempt_201405271131_9661_m_000004_0, Status : FAILED
java.io.IOException: Cannot obtain block length for LocatedBlock{BP-428948818-namenode-1392736828725:blk_-6790192659948575136_8493225; getBlockSize()=36904392; corrupt=false; offset=1342177280; locs=[datanode4:50010, datanode3:50010, datanode1:50010]}
at org.apache.hadoop.hdfs.DFSInputStream.readBlockLength(DFSInputStream.java:319)
at org.apache.hadoop.hdfs.DFSInputStream.fetchLocatedBlocksAndGetLastBlockLength(DFSInputStream.java:263)
at org.apache.hadoop.hdfs.DFSInputStream.openInfo(DFSInputStream.java:205)
at org.apache.hadoop.hdfs.DFSInputStream.<init>(DFSInputStream.java:198)
at org.apache.hadoop.hdfs.DFSClient.open(DFSClient.java:1117)
at org.apache.hadoop.hdfs.DistributedFileSystem.open(DistributedFileSystem.java:249)
at org.apache.hadoop.hdfs.DistributedFileSystem.open(DistributedFileSystem.java:82)
at org.apache.hadoop.fs.FileSystem.open(FileSystem.java:746)
at org.apache.hadoop.mapreduce.lib.input.LineRecordReader.initialize(LineRecordReader.java:83)
at org.apache.hadoop.mapred.Ma`
我尝试了以下命令:
hadoop fs -cat test/test.txt
给出以下错误
cat: Cannot obtain block length for LocatedBlock{BP-428948818-10.17.56.16-1392736828725:blk_-6790192659948575136_8493225; getBlockSize()=36904392; corrupt=false; offset=1342177280; locs=[datanode3:50010, datanode1:50010, datanode4:50010]}
此外,我无法复制文件hadoop fs -cp test/test.txt tmp
给出相同的错误:
cp: Cannot obtain block length for LocatedBlock{BP-428948818-10.17.56.16-1392736828725:blk_-6790192659948575136_8493225; getBlockSize()=36904392; corrupt=false; offset=1342177280; locs=[datanode1:50010, datanode3:50010, datanode4:50010]}
hdfs fsck/user/testuser/test/test.txt
命令的输出:
Connecting to namenode via `http://namenode:50070`
FSCK started by testuser (auth:SIMPLE) from /10.17.56.16 for path
/user/testuser/test/test.txt at Thu May 29 17:00:44 EEST 2014
Status: HEALTHY
Total size: 0 B (Total open files size: 1379081672 B)
Total dirs: 0
Total files: 0 (Files currently being written: 1)
Total blocks (validated): 0 (Total open file blocks (not validated): 21)
Minimally replicated blocks: 0
Over-replicated blocks: 0
Under-replicated blocks: 0
Mis-replicated blocks: 0
Default replication factor: 3
Average block replication: 0.0
Corrupt blocks: 0
Missing replicas: 0
Number of data-nodes: 5
Number of racks: 1
FSCK ended at Thu May 29 17:00:44 EEST 2014 in 0 milliseconds
The filesystem under path /user/testuser/test/test.txt is HEALTHY
顺便说一句,我可以从网络浏览器中看到 test.txt 文件的内容。
hadoop版本是:Hadoop 2.0.0-cdh4.5.0
最佳答案
我和你遇到了同样的问题,我通过以下步骤修复了它。 有一些文件被水槽打开但从未关闭(我不确定你的原因)。 您需要通过以下命令查找打开的文件的名称:
hdfs fsck /directory/of/locked/files/ -files -openforwrite
您可以尝试使用以下命令恢复文件:
hdfs debug recoverLease -path <path-of-the-file> -retries 3
或者通过命令删除它们:
hdfs dfs -rmr <path-of-the-file>
关于Hadoop、Mapreduce - 无法获取 locatedblock 的 block 长度,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/23936517/