我正在尝试使用 Hadoop。我的问题可能很基本,请耐心听我说。
我正在阅读Hadoop:权威指南并遵循天气数据教程。将数据复制到 HDFS 时,出现以下错误:
13/09/02 16:34:35 ERROR hdfs.DFSClient: Failed to close file /user/bhushan/gz/home/bhushan/ncdc_data/ftp3.ncdc.noaa.gov/pub/data/noaa/1901.gz
org.apache.hadoop.ipc.RemoteException: java.io.IOException: File /user/bhushan/gz/home/bhushan/ncdc_data/ftp3.ncdc.noaa.gov/pub/data/noaa/1901.gz could only be replicated to 0 nodes, instead of 1
at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(FSNamesystem.java:1920)
at org.apache.hadoop.hdfs.server.namenode.NameNode.addBlock(NameNode.java:783)
at sun.reflect.GeneratedMethodAccessor13.invoke(Unknown Source)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:601)
at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:587)
at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1432)
at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1428)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1190)
at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1426)
我的设置肯定有问题。当我看到报告
时,我得到的是:
bhushan@ubuntu:~/Documents/hadoop-1.2.1/bin$ hadoop dfsadmin -report
Configured Capacity: 0 (0 KB)
Present Capacity: 0 (0 KB)
DFS Remaining: 0 (0 KB)
DFS Used: 0 (0 KB)
DFS Used%: �%
Under replicated blocks: 0
Blocks with corrupt replicas: 0
Missing blocks: 0
-------------------------------------------------
Datanodes available: 0 (0 total, 0 dead)
这3个配置文件如下(全部按照书本):
hdfs-site.xml:
<?xml version="1.0"?>
<!-- hdfs-site.xml -->
<configuration>
<property>
<name>dfs.replication</name>
<value>1</value>
</property>
</configuration>
core-site.xml:
<?xml version="1.0"?>
<!-- core-site.xml -->
<configuration>
<property>
<name>fs.default.name</name>
<value>hdfs://localhost/</value>
</property>
</configuration>
mapred-site.xml:
<?xml version="1.0"?>
<!-- mapred-site.xml -->
<configuration>
<property>
<name>mapred.job.tracker</name>
<value>localhost:8021</value>
</property>
</configuration>
我多次格式化HDFS,但没有帮助。
我需要在某处明确指定 HDFS 大小吗?摘自书中:
Datanodes are not involved in the initial formatting process, since the namenode manages all of the filesystem’s metadata, and datanodes can join or leave the cluster dynamically. For the same reason, you don’t need to say how large a filesystem to create, since this is determined by the number of datanodes in the cluster, which can be increased as needed, long after the filesystem was formatted.
最佳答案
我认为您的 DataNode 进程没有运行。我猜你正在开发一个伪集群。运行“jps”命令并确保 DataNode 进程正在运行并持续一段时间,例如 4 到 5 分钟。如果 DataNode 正在运行或在几分钟内停止运行,则说明配置存在问题。您可以尝试以下解决方案。
停止集群。删除DataNode持久化目录。您必须使用 hdfs-site.xml 中的“dfs.data.dir”属性对其进行配置。如果你还没有配置那么它将使用Linux用户的临时目录。找到该目录并删除。然后再次启动集群。尝试再次复制该文件,它应该可以工作。
关于hadoop - Datanode容量为0kb,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/18582172/