python - python的Hadoop命令

我正在尝试为hdfs中的目录获取一些统计信息。我正在尝试获取文件/子目录的数量以及每个文件的大小。我开始以为我可以用bash做到这一点。

#!/bin/bash
OP=$(hadoop fs -ls hdfs://mydirectory)
echo $(wc -l < "$OP")

到目前为止，我只有这么多，我很快意识到python可能是一个更好的选择。但是我不知道如何执行像hadoop fs -ls from python这样的hadoop命令

最佳答案

有关选项，请参见https://docs.python.org/2/library/commands.html，包括如何获取返回状态(如果发生错误)。您缺少的基本代码是

import commands

hdir_list = commands.getoutput('hadoop fs -ls hdfs://mydirectory')

是:在2.6中已弃用，但在2.7中仍然有用，但已从Python 3中删除。

os.command (<code string>)

...或更好地使用子进程。调用(在2.4中引入)。

关于python - python的Hadoop命令，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/32999622/

相关文章：

python - 鳕鱼和 python