hadoop - namenode如何找到空的datanode？

namenode如何找到空的datanode？
当客户端请求写入datanodes时。
通过哪种算法？

最佳答案

您问题的答案非常复杂。在大多数情况下，作为Hadoop用户或什至是HDFS管理员，您可能无需完全关心NameNode如何确定将其块写入哪个节点。但是，如果您真的很好奇，请查看以下资源:

从Hadoop权威指南的Anatomy of a File Write中:

The client creates the file by calling create() on DistributedFileSystem (step 1 in Figure 3-3). DistributedFileSystem makes an RPC call to the namenode to create a new file in the filesystem’s namespace, with no blocks associated with it (step 2). The namenode performs various checks to make sure the file doesn’t already exist, and that the client has the right permissions to create the file. If these checks pass, the namenode makes a record of the new file; otherwise, file creation fails and the client is thrown an IOException. The DistributedFileSystem returns an FSDataOutputStream for the client to start writing data to. Just as in the read case, FSDataOutputStream wraps a DFSOutputStream, which handles communication with the datanodes and namenode.

As the client writes data (step 3), DFSOutputStream splits it into packets, which it writes to an internal queue, called the data queue. The data queue is consumed by the DataStreamer, whose responsibility it is to ask the namenode to allocate new blocks by picking a list of suitable datanodes to store the replicas. The list of datanodes forms a pipeline—we’ll assume the replication level is three, so there are three nodes in the pipeline. The DataStreamer streams the packets to the first datanode in the pipeline, which stores the packet and forwards it to the second datanode in the pipeline. Similarly, the second datanode stores the packet and forwards it to the third (and last) datanode in the pipeline.

如果要逐步执行，您还可以从最新的稳定的ASF Hadoop checkout 源代码:
https://github.com/apache/hadoop-common/blob/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/DistributedFileSystem.java#L346

关于hadoop - namenode如何找到空的datanode？，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/21252522/

hadoop - namenode如何找到空的datanode？

上一篇：hadoop - 如何在所有可用节点中运行Hive mapreduce任务？

下一篇：hadoop - 如何在setup()中初始化实例变量？