hadoop - 从hdfs中删除文件是否还会从复制的datanode中删除文件?

标签 hadoop hdfs

为了释放内存空间,我打算从hdfs中删除一些文件。我有一个3节点群集。

如果我从hdfs删除文件,它也会自动从复制的datanode删除文件吗?

最佳答案

是的,它也从复制的数据节点中删除,但是需要一些时间。要立即删除文件而没有垃圾,请使用-skiptrash标志。
这个link也很有用:

When a file is deleted by a user or an application, it is not immediately removed from HDFS. Instead, HDFS first renames it to a file in the /trash directory. The file can be restored quickly as long as it remains in /trash. A file remains in /trash for a configurable amount of time. After the expiry of its life in /trash, the NameNode deletes the file from the HDFS namespace. The deletion of a file causes the blocks associated with the file to be freed. Note that there could be an appreciable time delay between the time a file is deleted by a user and the time of the corresponding increase in free space in HDFS.

A user can Undelete a file after deleting it as long as it remains in the /trash directory. If a user wants to undelete a file that he/she has deleted, he/she can navigate the /trash directory and retrieve the file. The /trash directory contains only the latest copy of the file that was deleted. The /trash directory is just like any other directory with one special feature: HDFS applies specified policies to automatically delete files from this directory. The current default policy is to delete files from /trash that are more than 6 hours old. In the future, this policy will be configurable through a well defined interface.

关于hadoop - 从hdfs中删除文件是否还会从复制的datanode中删除文件?,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/52699444/

相关文章:

hadoop - Hadoop超时尝试在AWS多区域配置中写入Cassandra

scala - 在给定时间戳下以星火打开文件

hadoop - 无法将 pig 关系存储到Hbase

scala - Spark 未使用 core-site.xml 中的正确配置

hadoop - 如何将数字分割并在两者之间添加字符

java - 具有 MultipleInputs 的 Hadoop 映射器的控制流程是什么?

c# - 使用C#将文件复制到HDFS Hadoop环境

hadoop - 解压缩文件,然后上传到HDFS

java - Hadoop的Reducer运行在哪台机器上

Hadoop HDFS 最大文件大小