hadoop - HDFS 复制不足的 block 到文件映射

标签 hadoop hdfs

HDFS 文件系统显示,由于机架故障,集群上大约 600K 的 block 复制不足。有没有办法在 HDFS 恢复之前知道如果这些 block 丢失哪些文件将受到影响? 我无法执行“fsck/”,因为集群非常大。

最佳答案

Namenode UI 列出了丢失的 block ,JMX 日志列出了损坏/丢失的 block 。 UI 和 JMX 仅显示复制不足的 block 的数量。

有两种方法可以查看复制不足的 block /文件:使用 fsck 或 WebHDFS API。

使用WebHDFS REST API:

curl -i  "http://<HOST>:<PORT>/webhdfs/v1/<PATH>?op=LISTSTATUS"

这将返回带有 FileStatuses JSON 对象的响应。解析 JSON 对象并过滤复制量小于配置值的文件。

请在下面找到从 NN 返回的示例响应:

curl -i "http://<NN_HOST>:<HTTP_PORT>/webhdfs/v1/<PATH_OF_DIRECTORY>?op=LISTSTATUS"
HTTP/1.1 200 OK
Cache-Control: no-cache
Content-Type: application/json
Transfer-Encoding: chunked
Server: Jetty(6.1.26.hwx)

{"FileStatuses":{"FileStatus":[
{"accessTime":1489059994224,"blockSize":134217728,"childrenNum":0,"fileId":209158298,"group":"hdfs","length":0,"modificationTime":1489059994227,"owner":"XXX","pathSuffix":"_SUCCESS","permission":"644","replication":3,"storagePolicy":0,"type":"FILE"},
{"accessTime":1489059969939,"blockSize":134217728,"childrenNum":0,"fileId":209158053,"group":"hdfs","length":0,"modificationTime":1489059986846,"owner":"XXX","pathSuffix":"part-m-00000","permission":"644","replication":3,"storagePolicy":0,"type":"FILE"},
{"accessTime":1489059982614,"blockSize":134217728,"childrenNum":0,"fileId":209158225,"group":"hdfs","length":0,"modificationTime":1489059993497,"owner":"XXX","pathSuffix":"part-m-00001","permission":"644","replication":3,"storagePolicy":0,"type":"FILE"},
{"accessTime":1489059977524,"blockSize":134217728,"childrenNum":0,"fileId":209158188,"group":"hdfs","length":0,"modificationTime":1489059983034,"owner":"XXX","pathSuffix":"part-m-00002","permission":"644","replication":3,"storagePolicy":0,"type":"FILE"}]}}

如果文件数量较多,还可以使用 ?op=LISTSTATUS_BATCH&startAfter=<CHILD> 迭代列出文件

引用:https://hadoop.apache.org/docs/r3.1.0/hadoop-project-dist/hadoop-hdfs/WebHDFS.html#Iteratively_List_a_Directory

关于hadoop - HDFS 复制不足的 block 到文件映射,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/51563718/

相关文章:

batch-file - Apache Spark : batch processing of files

shell - 使用 hadoop -fs shell 重用 hadoop 连接

mongodb - 如何将子文档数据数组从mongodb加载到Hive

java - SecondarySort -OverrideError(删除注释)

hadoop - HPCC/HDFS连接器

java hdfs api Protocol Buffer 异常

events - 从 HDFS 接收文件系统事件和通知

linux - 从单 Node 过渡到多 Node hadoop same

hadoop - HIVE多组依负运算

hadoop - 是否有 hdfs 命令根据时间戳列出 HDFS 目录中的文件