hadoop - 无法从远程客户端连接到 HDFS 数据节点

标签 hadoop

我目前正在试验使用 Hadoop 2.3.0 构建的遗留应用程序(我知道.. 不要问)。只要我在与单节点 hadoop 部署相同的机器上运行客户端,一切都运行良好。现在我将客户端应用程序转移到本地网络上的另一台机器上,我无法连接到数据节点。

2018-04-02 14:33:29.661/IST WARN  [hadoop.hdfs.BlockReaderFactory] I/O error constructing remote block reader.
java.net.ConnectException: Connection refused
at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:717)
at org.apache.hadoop.net.SocketIOWithTimeout.connect(SocketIOWithTimeout.java:206)
at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:529)
at org.apache.hadoop.hdfs.DFSClient.newConnectedPeer(DFSClient.java:3044)
at org.apache.hadoop.hdfs.BlockReaderFactory.nextTcpPeer(BlockReaderFactory.java:744)
at org.apache.hadoop.hdfs.BlockReaderFactory.getRemoteBlockReaderFromTcp(BlockReaderFactory.java:659)
at org.apache.hadoop.hdfs.BlockReaderFactory.build(BlockReaderFactory.java:327)
at org.apache.hadoop.hdfs.DFSInputStream.blockSeekTo(DFSInputStream.java:574)
at org.apache.hadoop.hdfs.DFSInputStream.readWithStrategy(DFSInputStream.java:797)
at org.apache.hadoop.hdfs.DFSInputStream.read(DFSInputStream.java:844)
at java.io.DataInputStream.read(DataInputStream.java:149)
at java.io.FilterInputStream.read(FilterInputStream.java:133)
at java.io.PushbackInputStream.read(PushbackInputStream.java:186)
at java.util.zip.ZipInputStream.readFully(ZipInputStream.java:403)
at java.util.zip.ZipInputStream.readLOC(ZipInputStream.java:278)
at java.util.zip.ZipInputStream.getNextEntry(ZipInputStream.java:122)
at opennlp.tools.util.model.BaseModel.loadModel(BaseModel.java:220)
at opennlp.tools.util.model.BaseModel.<init>(BaseModel.java:181)
at opennlp.tools.tokenize.TokenizerModel.<init>(TokenizerModel.java:125)

还有……

2018-04-02 14:33:29.666/IST WARN  [hadoop.hdfs.DFSClient] Failed to connect to localhost/127.0.0.1:50010 for block, add to deadNodes and continue. java.net.ConnectException: Connection refused
java.net.ConnectException: Connection refused
at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:717)
at org.apache.hadoop.net.SocketIOWithTimeout.connect(SocketIOWithTimeout.java:206)
at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:529)
at org.apache.hadoop.hdfs.DFSClient.newConnectedPeer(DFSClient.java:3044)
at org.apache.hadoop.hdfs.BlockReaderFactory.nextTcpPeer(BlockReaderFactory.java:744)
at org.apache.hadoop.hdfs.BlockReaderFactory.getRemoteBlockReaderFromTcp(BlockReaderFactory.java:659)
at org.apache.hadoop.hdfs.BlockReaderFactory.build(BlockReaderFactory.java:327)
at org.apache.hadoop.hdfs.DFSInputStream.blockSeekTo(DFSInputStream.java:574)
at org.apache.hadoop.hdfs.DFSInputStream.readWithStrategy(DFSInputStream.java:797)
at org.apache.hadoop.hdfs.DFSInputStream.read(DFSInputStream.java:844)
at java.io.DataInputStream.read(DataInputStream.java:149)
at java.io.FilterInputStream.read(FilterInputStream.java:133)
at java.io.PushbackInputStream.read(PushbackInputStream.java:186)
at java.util.zip.ZipInputStream.readFully(ZipInputStream.java:403)
at java.util.zip.ZipInputStream.readLOC(ZipInputStream.java:278)
at java.util.zip.ZipInputStream.getNextEntry(ZipInputStream.java:122)
at opennlp.tools.util.model.BaseModel.loadModel(BaseModel.java:220)
at opennlp.tools.util.model.BaseModel.<init>(BaseModel.java:181)
at opennlp.tools.tokenize.TokenizerModel.<init>(TokenizerModel.java:125)

现在我可以从客户端的网络浏览器监控 hadoop 部署,一切似乎都运行良好。

Hadoop monitoring UI screenshot

我已阅读答案 herehere ,但我仍然遇到同样的错误。我无法让客户端停止查找 localhost/127.0.0.1:50010 而不是数据节点的正确 IP 地址(或主机名)。

首先关心的是我是否遗漏了一些要在客户端应用程序上完成的配置。我的应用程序使用一个名为 HADOOP_URL 的变量连接到数据库,它的值被正确设置为集群的主机名,该主机名又解析为 /etc/hosts。可能是我缺少在客户端设置的更多配置。在这里有一些想法会很好。

然而,this answer建议 Namenode 通知客户端有关 Datanode 的主机名。这支持我的客户端能够连接到 Namenode 的可能性,因此客户端配置工作正常。

最后,我需要找到一种方法让 Namenode 返回我设置的 hostname,而不是返回 localhost/127.0.0.1。 我该如何解决这个问题?

最佳答案

So lastly, I need to find a way for the Namenode to return hostname that I set instead of returning localhost/127.0.0.1. How do I go about fixing this?

=> 根据this article , 也许 here是你需要的配置

By default HDFS clients connect to DataNodes using the IP address provided by the NameNode. Depending on the network configuration this IP address may be unreachable by the clients. The fix is letting clients perform their own DNS resolution of the DataNode hostname. The following setting enables this behavior.

<property>
  <name>dfs.client.use.datanode.hostname</name>
  <value>true</value>
  <description>Whether clients should use datanode hostnames when
    connecting to datanodes.
  </description>
</property>

关于hadoop - 无法从远程客户端连接到 HDFS 数据节点,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/49613451/

相关文章:

hadoop - 合流 HDFS 连接器 : How can I read from the latest offset when there are no hdfs files?

哈多普。重启 map

linux - 使用 java_home 环境变量识别问题来源

hadoop - java.lang.IllegalArgumentException:没有枚举常量org.apache.hadoop.mapreduce.JobCounter.MB_MILLIS_MAPS

hadoop - 在级联中实现笛卡尔连接

hadoop - 使用 Spark Streaming 将非结构化数据持久化到 Hadoop

java - 找不到错误 org.apache.xerces.jaxp.DocumentBuilderFactoryImpl

streaming - Hadoop 流式 grep 不起作用

hadoop - 表格编程

hadoop - 驱动程序类编译错误-hadoop Mapreduce