HDFS 文件系统显示,由于机架故障,集群上大约 600K 的 block 复制不足。有没有办法在 HDFS 恢复之前知道如果这些 block 丢失哪些文件将受到影响? 我无法执行“fsck/”,因为集群非常大。
最佳答案
Namenode UI 列出了丢失的 block ,JMX 日志列出了损坏/丢失的 block 。 UI 和 JMX 仅显示复制不足的 block 的数量。
有两种方法可以查看复制不足的 block /文件:使用 fsck 或 WebHDFS API。
使用WebHDFS REST API:
curl -i "http://<HOST>:<PORT>/webhdfs/v1/<PATH>?op=LISTSTATUS"
这将返回带有 FileStatuses JSON 对象的响应。解析 JSON 对象并过滤复制量小于配置值的文件。
请在下面找到从 NN 返回的示例响应:
curl -i "http://<NN_HOST>:<HTTP_PORT>/webhdfs/v1/<PATH_OF_DIRECTORY>?op=LISTSTATUS"
HTTP/1.1 200 OK
Cache-Control: no-cache
Content-Type: application/json
Transfer-Encoding: chunked
Server: Jetty(6.1.26.hwx)
{"FileStatuses":{"FileStatus":[
{"accessTime":1489059994224,"blockSize":134217728,"childrenNum":0,"fileId":209158298,"group":"hdfs","length":0,"modificationTime":1489059994227,"owner":"XXX","pathSuffix":"_SUCCESS","permission":"644","replication":3,"storagePolicy":0,"type":"FILE"},
{"accessTime":1489059969939,"blockSize":134217728,"childrenNum":0,"fileId":209158053,"group":"hdfs","length":0,"modificationTime":1489059986846,"owner":"XXX","pathSuffix":"part-m-00000","permission":"644","replication":3,"storagePolicy":0,"type":"FILE"},
{"accessTime":1489059982614,"blockSize":134217728,"childrenNum":0,"fileId":209158225,"group":"hdfs","length":0,"modificationTime":1489059993497,"owner":"XXX","pathSuffix":"part-m-00001","permission":"644","replication":3,"storagePolicy":0,"type":"FILE"},
{"accessTime":1489059977524,"blockSize":134217728,"childrenNum":0,"fileId":209158188,"group":"hdfs","length":0,"modificationTime":1489059983034,"owner":"XXX","pathSuffix":"part-m-00002","permission":"644","replication":3,"storagePolicy":0,"type":"FILE"}]}}
如果文件数量较多,还可以使用 ?op=LISTSTATUS_BATCH&startAfter=<CHILD>
迭代列出文件
关于hadoop - HDFS 复制不足的 block 到文件映射,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/51563718/