hadoop - namenode如何找到空的datanode?

标签 hadoop namespaces client

namenode如何找到空的datanode?
当客户端请求写入datanodes时。
通过哪种算法?

最佳答案

您问题的答案非常复杂。在大多数情况下,作为Hadoop用户或什至是HDFS管理员,您可能无需完全关心NameNode如何确定将其块写入哪个节点。但是,如果您真的很好奇,请查看以下资源:

从Hadoop权威指南的Anatomy of a File Write中:

The client creates the file by calling create() on DistributedFileSystem (step 1 in Figure 3-3). DistributedFileSystem makes an RPC call to the namenode to create a new file in the filesystem’s namespace, with no blocks associated with it (step 2). The namenode performs various checks to make sure the file doesn’t already exist, and that the client has the right permissions to create the file. If these checks pass, the namenode makes a record of the new file; otherwise, file creation fails and the client is thrown an IOException. The DistributedFileSystem returns an FSDataOutputStream for the client to start writing data to. Just as in the read case, FSDataOutputStream wraps a DFSOutputStream, which handles communication with the datanodes and namenode.

As the client writes data (step 3), DFSOutputStream splits it into packets, which it writes to an internal queue, called the data queue. The data queue is consumed by the DataStreamer, whose responsibility it is to ask the namenode to allocate new blocks by picking a list of suitable datanodes to store the replicas. The list of datanodes forms a pipeline—we’ll assume the replication level is three, so there are three nodes in the pipeline. The DataStreamer streams the packets to the first datanode in the pipeline, which stores the packet and forwards it to the second datanode in the pipeline. Similarly, the second datanode stores the packet and forwards it to the third (and last) datanode in the pipeline.



如果要逐步执行,您还可以从最新的稳定的ASF Hadoop checkout 源代码:
https://github.com/apache/hadoop-common/blob/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/DistributedFileSystem.java#L346

关于hadoop - namenode如何找到空的datanode?,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/21252522/

相关文章:

c++ - 在哪里制作类的逻辑?

Java TCP 客户端/服务器

java - Hadoop : Reducer class not called even with Overrides

hadoop - 用于收集 syslog 数据的水槽

html - 如何为 Hadoop 定制 Hue

xml - 列出persistence.xml 中的所有持久化单元

hadoop - oozie:并行运行数百个作业

C++ 命名空间冲突

Python 客户端/服务器问题

java - 使用 Smack API 自动将用户添加到名册中