amazon-ec2 - 亚马逊云上的 Hadoop

我正在尝试在 Amazon Cloud 上进行设置以运行一些 hadoop MapReduce 作业，但我正在努力成功创建集群。我已经下载了 ec2 文件，拥有我的证书和 key 对文件，但我相信是 AMI 给我带来了麻烦。如果我尝试运行具有一个主节点和 n 个从节点的集群，我会使用标准兼容的 AMI 启动 n+1 个实例，然后在终端中运行代码“hadoop-ec2 启动集群名称 n”。主节点成功，但是当从节点开始启动时出现错误，说“缺少参数 -h(AMI 丢失)”，我不完全确定如何进行。

此外，我的一些工作需要更改 hadoops 参数设置(特别是 mapred-site.xml 配置文件)，是否可以更改此文件，如果可以，我如何访问它？亚马逊机器上是否已经安装了 hadoop，并且该文件可以访问和更改？

谢谢

最佳答案

你试过Amazon Elastic MapReduce ?这是一个简单的 API，可以按需启动指定大小的 Hadoop 集群。
这比手动创建自己的集群更容易。
但是一旦作业流程默认完成，它就会关闭集群，在 S3 上留下输出。如果你需要做的只是做一些碾压，这可能是要走的路。
如果您需要永久存储 HDFS 内容(例如，如果您在 Hadoop 之上运行 HBase)，您实际上可能需要在 EC2 上拥有自己的集群。在这种情况下，您可能会发现 Cloudera 为 Amazon EC2 提供的 Hadoop 分发版很有用。
使用 EC2 引导操作可以更改将要启动的节点上的 Hadoop 配置:

Q: How do I configure Hadoop settings for my job flow?

The Elastic MapReduce default Hadoop configuration is appropriate for most workloads. However, based on your job flow’s specific memory and processing requirements, it may be appropriate to tune these settings. For example, if your job flow tasks are memory-intensive, you may choose to use fewer tasks per core and reduce your job tracker heap size. For this situation, a pre-defined Bootstrap Action is available to configure your job flow on startup. See the Configure Memory Intensive Bootstrap Action in the Developer’s Guide for configuration details and usage instructions. An additional predefined bootstrap action is available that allows you to customize your cluster settings to any value of your choice. See the Configure Hadoop Bootstrap Action in the Developer’s Guide for usage instructions.

关于您启动集群的方式，请澄清:

If I'm trying to run a cluster with a master node and n slave nodes, I start n+1 instances using standard compatible AMIs and then run the code "hadoop-ec2 launch-cluster name n" in the terminal. The master node is successful, but I get an error when the slave nodes start to launch, saying "missing parameter -h (AMI missing)" and I'm not entirely sure how to progress.

你究竟是如何尝试启动它的？您使用的究竟是什么 AMI？

关于amazon-ec2 - 亚马逊云上的 Hadoop，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/8041324/

amazon-ec2 - 亚马逊云上的 Hadoop

上一篇：bash - Hadoop作业配置文件规范

下一篇：hadoop - 50个节点的Cloudera CDH3的容量是多少