linux - 运行 hadoop 集群时在 Google Cloud Platform 上获取 'sudo: unknown user: hadoop' 和 'sudo: unable to initialize policy plugin error'

我正在尝试部署 Google 在 https://github.com/GoogleCloudPlatform/solutions-google-compute-engine-cluster-for-hadoop 提供的示例 Hadoop 应用程序在谷歌云平台上。

我逐步按照那里给出的所有设置说明进行操作。我能够设置环境并成功启动集群。但是我无法运行 MapReduce 部分。我正在我的终端上执行这个命令:

./compute_cluster_for_hadoop.py mapreduce <project ID> <bucket name> [--prefix <prefix>]
--input gs://<input directory on Google Cloud Storage>  \
--output gs://<output directory on Google Cloud Storage>  \
--mapper sample/shortest-to-longest-mapper.pl  \
--reducer sample/shortest-to-longest-reducer.pl  \
--mapper-count 5  \
--reducer-count 1

我收到以下错误:

sudo: unknown user: hadoop
sudo: unable to initialize policy plugin
Traceback (most recent call last):
File "./compute_cluster_for_hadoop.py", line 230, in <module>
main()
File "./compute_cluster_for_hadoop.py", line 226, in main
ComputeClusterForHadoop().ParseArgumentsAndExecute(sys.argv[1:])
File "./compute_cluster_for_hadoop.py", line 222, in ParseArgumentsAndExecute
params.handler(params)
File "./compute_cluster_for_hadoop.py", line 51, in MapReduce
gce_cluster.GceCluster(flags).StartMapReduce()
File "/home/ubuntu-gnome/Hadoop-sample-app/solutions-google-compute-engine-cluster-for-hadoop-master/gce_cluster.py", line 545, in StartMapReduce
input_dir, output_dir)
File "/home/ubuntu-gnome/Hadoop-sample-app/solutions-google-compute-engine-cluster-for-hadoop-master/gce_cluster.py", line 462, in _StartScriptAtMaster
raise RemoteExecutionError('Remote execution error')
gce_cluster.RemoteExecutionError: Remote execution error

由于我已经按照原样执行了所有步骤，所以我无法理解为什么会出现此问题？

究竟是之前执行的脚本中没有创建'hadoop'用户，还是用户权限有问题？或者问题出在其他地方？

请帮我解决这个错误..!!我被困在这里，无法继续。

最佳答案

设置过程通常会自动创建用户“hadoop”；它在里面完成 startup-script.sh第 75-76 行:

# Set up user and group
groupadd --gid 5555 hadoop
useradd --uid 1111 --gid hadoop --shell /bin/bash -m hadoop

设置的某些部分可能实际上失败了。

也就是说，如果您正在编写自己的直接与 GCE API 交互的 Python 应用程序，您引用的示例作为起点仍然有用，但作为在 Google Compute Engine 上部署 Hadoop 的一种方式已被弃用。如果你真的想使用 Hadoop，你应该使用谷歌支持的部署工具 bdutil and its associated quickstart .部署的集群有一些相似之处，包括用户 hadoop 的设置。然而，一个关键的区别是 bdutil 还将包含和配置 GCS connector for Hadoop这样您的 MapReduce 就可以直接对 GCS 中的数据进行操作，而不需要先将其复制到 HDFS。

关于linux - 运行 hadoop 集群时在 Google Cloud Platform 上获取 'sudo: unknown user: hadoop' 和 'sudo: unable to initialize policy plugin error'，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/26737700/

linux - 运行 hadoop 集群时在 Google Cloud Platform 上获取 'sudo: unknown user: hadoop' 和 'sudo: unable to initialize policy plugin error'

上一篇：python - 将 JAR 文件添加到 Python 脚本

下一篇：java - 为什么推测执行对 Giraph 没有意义？