hadoop - 试图让 Hadoop 在伪分布式模式下工作 : connection refused and other errors

标签 hadoop ssh hdfs

我已经在我的 Linux Mint 17.1 机器上安装了 Hadoop 2.7.3,并且正在关注 Apache tutorial让它运行。我一直在密切关注此页面上的说明,并且已经到了可以通过 ssh 进入 localhost 并运行 start-dfs.shstart-yarn 的地步.sh。我还格式化了名称节点。

我的core-site.xml文件是按照教程编辑的:

<configuration>
    <property>
        <name>fs.defaultFS</name>
        <value>hdfs://localhost:9000</value>
    </property>
</configuration>

与 hdfs-site.xml 一样:

<configuration>
    <property>
        <name>dfs.replication</name>
        <value>1</value>
    </property>
</configuration>

但是,运行命令 hadoop fs -mkdir/test 会出现以下错误:

mkdir: Failed on local exception: com.google.protobuf.InvalidProtocolBufferException: Protocol message tag had invalid wire type.; Host Details : local host is: "MILTON/127.0.1.1"; destination host is: "localhost":9000;

运行 jps 给我这个输出:

15388 Jps
14966 ResourceManager
14615 DataNode
15077 NodeManager
14787 SecondaryNameNode

查看我的日志文件,我看到了可能与此有关的更详细的错误和警告。在 hadoop-user-namenode-MILTON.log 中我看到这个错误:

org.apache.hadoop.hdfs.server.common.InconsistentFSStateException: Directory /tmp/hadoop-user/dfs/name is in an inconsistent state: storage directory does not exist or is not accessible.
    at org.apache.hadoop.hdfs.server.namenode.FSImage.recoverStorageDirs(FSImage.java:327)
    at org.apache.hadoop.hdfs.server.namenode.FSImage.recoverTransitionRead(FSImage.java:215)
    at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.loadFSImage(FSNamesystem.java:975)
    at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.loadFromDisk(FSNamesystem.java:681)
    at org.apache.hadoop.hdfs.server.namenode.NameNode.loadNamesystem(NameNode.java:585)
    at org.apache.hadoop.hdfs.server.namenode.NameNode.initialize(NameNode.java:645)
    at org.apache.hadoop.hdfs.server.namenode.NameNode.<init>(NameNode.java:812)
    at org.apache.hadoop.hdfs.server.namenode.NameNode.<init>(NameNode.java:796)
    at org.apache.hadoop.hdfs.server.namenode.NameNode.createNameNode(NameNode.java:1493)
    at org.apache.hadoop.hdfs.server.namenode.NameNode.main(NameNode.java:1559)

在 hadoop-user-secondarynamenode-MILTON.log 中,我看到了我在命令行上遇到的异常的完整回溯:

java.io.IOException: Failed on local exception: com.google.protobuf.InvalidProtocolBufferException: Protocol message tag had invalid wire type.; Host Details : local host is: "MILTON/127.0.1.1"; destination host is: "localhost":9000; 
    at org.apache.hadoop.net.NetUtils.wrapException(NetUtils.java:773)
    at org.apache.hadoop.ipc.Client.call(Client.java:1479)
    at org.apache.hadoop.ipc.Client.call(Client.java:1412)
    at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:229)
    at com.sun.proxy.$Proxy9.getTransactionId(Unknown Source)
    at org.apache.hadoop.hdfs.protocolPB.NamenodeProtocolTranslatorPB.getTransactionID(NamenodeProtocolTranslatorPB.java:128)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
    at java.lang.reflect.Method.invoke(Method.java:606)
    at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:191)
    at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:102)
    at com.sun.proxy.$Proxy10.getTransactionID(Unknown Source)
    at org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode.countUncheckpointedTxns(SecondaryNameNode.java:641)
    at org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode.shouldCheckpointBasedOnCount(SecondaryNameNode.java:649)
    at org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode.doWork(SecondaryNameNode.java:393)
    at org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode$1.run(SecondaryNameNode.java:361)
    at org.apache.hadoop.security.SecurityUtil.doAsLoginUserOrFatal(SecurityUtil.java:415)
    at org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode.run(SecondaryNameNode.java:357)
    at java.lang.Thread.run(Thread.java:745)
Caused by: com.google.protobuf.InvalidProtocolBufferException: Protocol message tag had invalid wire type.
    at com.google.protobuf.InvalidProtocolBufferException.invalidWireType(InvalidProtocolBufferException.java:99)
    at com.google.protobuf.UnknownFieldSet$Builder.mergeFieldFrom(UnknownFieldSet.java:498)
    at com.google.protobuf.GeneratedMessage.parseUnknownField(GeneratedMessage.java:193)
    at org.apache.hadoop.ipc.protobuf.RpcHeaderProtos$RpcResponseHeaderProto.<init>(RpcHeaderProtos.java:2207)
    at org.apache.hadoop.ipc.protobuf.RpcHeaderProtos$RpcResponseHeaderProto.<init>(RpcHeaderProtos.java:2165)
    at org.apache.hadoop.ipc.protobuf.RpcHeaderProtos$RpcResponseHeaderProto$1.parsePartialFrom(RpcHeaderProtos.java:2295)
    at org.apache.hadoop.ipc.protobuf.RpcHeaderProtos$RpcResponseHeaderProto$1.parsePartialFrom(RpcHeaderProtos.java:2290)
    at com.google.protobuf.AbstractParser.parsePartialFrom(AbstractParser.java:200)
    at com.google.protobuf.AbstractParser.parsePartialDelimitedFrom(AbstractParser.java:241)
    at com.google.protobuf.AbstractParser.parseDelimitedFrom(AbstractParser.java:253)
    at com.google.protobuf.AbstractParser.parseDelimitedFrom(AbstractParser.java:259)
    at com.google.protobuf.AbstractParser.parseDelimitedFrom(AbstractParser.java:49)
    at org.apache.hadoop.ipc.protobuf.RpcHeaderProtos$RpcResponseHeaderProto.parseDelimitedFrom(RpcHeaderProtos.java:3167)
    at org.apache.hadoop.ipc.Client$Connection.receiveRpcResponse(Client.java:1085)
    at org.apache.hadoop.ipc.Client$Connection.run(Client.java:979)

早些时候在日志中我也看到了这个异常:

java.net.ConnectException: Call From MILTON/127.0.1.1 to localhost:8020 failed on connection exception: java.net.ConnectException: Connection refused; For more details see:  http://wiki.apache.org/hadoop/ConnectionRefused
    at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
    at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:57)
    at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
    at java.lang.reflect.Constructor.newInstance(Constructor.java:526)
    at org.apache.hadoop.net.NetUtils.wrapWithMessage(NetUtils.java:792)
    at org.apache.hadoop.net.NetUtils.wrapException(NetUtils.java:732)
    at org.apache.hadoop.ipc.Client.call(Client.java:1479)
    at org.apache.hadoop.ipc.Client.call(Client.java:1412)
    at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:229)
    at com.sun.proxy.$Proxy9.getTransactionId(Unknown Source)
    at org.apache.hadoop.hdfs.protocolPB.NamenodeProtocolTranslatorPB.getTransactionID(NamenodeProtocolTranslatorPB.java:128)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
    at java.lang.reflect.Method.invoke(Method.java:606)
    at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:191)
    at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:102)
    at com.sun.proxy.$Proxy10.getTransactionID(Unknown Source)
    at org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode.countUncheckpointedTxns(SecondaryNameNode.java:641)
    at org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode.shouldCheckpointBasedOnCount(SecondaryNameNode.java:649)
    at org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode.doWork(SecondaryNameNode.java:393)
    at org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode$1.run(SecondaryNameNode.java:361)
    at org.apache.hadoop.security.SecurityUtil.doAsLoginUserOrFatal(SecurityUtil.java:415)
    at org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode.run(SecondaryNameNode.java:357)
    at java.lang.Thread.run(Thread.java:745)
Caused by: java.net.ConnectException: Connection refused
    at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
    at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:744)
    at org.apache.hadoop.net.SocketIOWithTimeout.connect(SocketIOWithTimeout.java:206)
    at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:531)
    at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:495)
    at org.apache.hadoop.ipc.Client$Connection.setupConnection(Client.java:614)
    at org.apache.hadoop.ipc.Client$Connection.setupIOstreams(Client.java:712)
    at org.apache.hadoop.ipc.Client$Connection.access$2900(Client.java:375)
    at org.apache.hadoop.ipc.Client.getConnection(Client.java:1528)
    at org.apache.hadoop.ipc.Client.call(Client.java:1451)
    ... 18 more

与此同时,datanode 日志重复记录如下所示的消息:

2017-03-14 20:50:37,785 WARN org.apache.hadoop.hdfs.server.datanode.DataNode: Problem connecting to server: localhost/127.0.0.1:9000

我还注意到我无法按照教程的建议访问名称节点的 Web 界面;试图去 http://localhost:9870/ 给我一个 ERR_CONNECTION_REFUSED。我猜名字节点根本没有开始;当我运行 stop-dfs.sh 时,它说“本地主机:没有要停止的名称节点”。我做错了什么导致教程中的步骤对我失败?

注意:我知道已经发布了许多类似的问题。他们的共同点往往是他们的设置比我的更复杂(例如完全分布式模式,而不仅仅是运行基本教程)或者他们找到了一个与教程不同的解决方案。我有兴趣学习,特别是,为什么我不能让教程为我工作,并在应用我还不了解的其他更改之前修复它。除非 Apache 教程真的只是弄错了。

2017 年 3 月 20 日更新:我听从了 franklinsijo 的建议,在添加了他推荐的属性并格式化了 namenode 之后,我的 namenode 日志文件中出现了这个异常:

java.net.BindException: Problem binding to [localhost:9000] java.net.BindException: Address already in use; For more details see:  http://wiki.apache.org/hadoop/BindException
    at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
    at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:57)
    at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
    at java.lang.reflect.Constructor.newInstance(Constructor.java:526)
    at org.apache.hadoop.net.NetUtils.wrapWithMessage(NetUtils.java:792)
    at org.apache.hadoop.net.NetUtils.wrapException(NetUtils.java:721)
    at org.apache.hadoop.ipc.Server.bind(Server.java:425)
    at org.apache.hadoop.ipc.Server$Listener.<init>(Server.java:574)
    at org.apache.hadoop.ipc.Server.<init>(Server.java:2215)
    at org.apache.hadoop.ipc.RPC$Server.<init>(RPC.java:951)
    at org.apache.hadoop.ipc.ProtobufRpcEngine$Server.<init>(ProtobufRpcEngine.java:534)
    at org.apache.hadoop.ipc.ProtobufRpcEngine.getServer(ProtobufRpcEngine.java:509)
    at org.apache.hadoop.ipc.RPC$Builder.build(RPC.java:796)
    at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.<init>(NameNodeRpcServer.java:345)
    at org.apache.hadoop.hdfs.server.namenode.NameNode.createRpcServer(NameNode.java:674)
    at org.apache.hadoop.hdfs.server.namenode.NameNode.initialize(NameNode.java:647)
    at org.apache.hadoop.hdfs.server.namenode.NameNode.<init>(NameNode.java:812)
    at org.apache.hadoop.hdfs.server.namenode.NameNode.<init>(NameNode.java:796)
    at org.apache.hadoop.hdfs.server.namenode.NameNode.createNameNode(NameNode.java:1493)
    at org.apache.hadoop.hdfs.server.namenode.NameNode.main(NameNode.java:1559)
Caused by: java.net.BindException: Address already in use
    at sun.nio.ch.Net.bind0(Native Method)
    at sun.nio.ch.Net.bind(Net.java:463)
    at sun.nio.ch.Net.bind(Net.java:455)
    at sun.nio.ch.ServerSocketChannelImpl.bind(ServerSocketChannelImpl.java:223)
    at sun.nio.ch.ServerSocketAdaptor.bind(ServerSocketAdaptor.java:74)
    at org.apache.hadoop.ipc.Server.bind(Server.java:408)
    ... 13 more

在我的/etc/ssh/sshd_config 文件中注释掉“端口 9000”行不会更改此异常。

最佳答案

默认情况下,hadoop.tmp.dir/tmp/hadoop-${user.name}

tmp.dir 用作 dfs.namenode.name.dir 的基本目录,如 file://${hadoop.tmp.dir }/dfs/名称/tmp 中的文件会在重启时丢失。因此之前格式化的namenode目录在重启后不再可用。

没有一致的存储目录,就不可能启动NameNode守护进程。因此,异常

org.apache.hadoop.hdfs.server.common.InconsistentFSStateException: Directory /tmp/hadoop-user/dfs/name is in an inconsistent state: storage directory does not exist or is not accessible

要么将此属性添加到 core-site.xml

<property>
   <name>hadoop.tmp.dir</name>
   <value>/home/user/dfs</value> <!-- Any directory, other than /tmp, that exists and the user running hadoop has access -->
</property>

或者通过将这些属性添加到hdfs-site.xml

来显式定义用于存储元数据和数据 block 的目录
<property>
   <name>dfs.namenode.name.dir</name>
   <value>/home/user/dfs/name</value>
</property>
<property>
   <name>dfs.datanode.data.dir</name>
   <value>/home/user/dfs/data</value>
</property>

注意:您也可以同时执行这两项操作。

然后格式化NameNode并启动服务。其他错误来自服务,因为它们无法与 NameNode 守护程序联系(因为它没有运行)。仅当 NameNode 处于事件状态时才能访问 Web 界面。

关于hadoop - 试图让 Hadoop 在伪分布式模式下工作 : connection refused and other errors,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/42799879/

相关文章:

linux - $ ssh -G 2>&1 | grep -e 非法 -e 未知 >/dev/null && echo "Sys­tem clean"||回显 "Sys­tem in­fec­ted"

hadoop - 将流利的时间包括到json帖子数据中

hadoop - HDFS 等分布式文件系统上的 OpenMPI

hadoop - hadoop put命令中的目标已经存在错误

hadoop - Apache Nifi MergeContent 输出数据不一致?

java - org.apache.hadoop.io.Text无法转换为org.apache.hadoop.io.NullWritable

file - HDFS 文件何时可见

java - 将垃圾收集日志保存到 ${yarn.nodemanager.log-dirs}/application_${appid}/container_${contid} 中,用于 Hadoop Yarn 上的映射器和缩减器

node.js - NodeJS ssh2 fastGet 在解压时挂起

ssh - 通过 ssh 登录加载 bashrc