amazon-ec2 - 为 aws emr 创建边缘节点的步骤

标签 amazon-ec2 emr

我需要为 AWS EMR 集群创建边缘节点 (ec2)。是否有我可以遵循的步骤列表来实现这一目标?

最佳答案

在您的 EC2 实例(边缘节点)上以 root 身份运行以下命令

mkdir -p /usr/lib/spark
mkdir -p /usr/lib/hive-webhcat/share/hcatalog
vi  /etc/profile.d/spark.sh
    export SPARK_HOME=/usr/lib/spark
    export PATH=$SPARK_HOME/bin:$PATH
    export HADOOP_CONF_DIR=/etc/hadoop/conf
export SPARK_CONF_DIR=/etc/spark/conf

source /etc/profile.d/spark.sh
mkdir -p /etc/hadoop/conf
chown -R kylo:kylo /etc/hadoop/conf
mkdir -p /etc/spark/conf
chown -R kylo:kylo /etc/spark/conf
mkdir -p /usr/share/aws /usr/lib/sqoop /usr/lib/hadoop-yarn /usr/lib/hadoop-mapreduce /usr/lib/hadoop-hdfs /usr/lib/hadoop
chown kylo:kylo /usr/share/aws /usr/lib/sqoop /usr/lib/hadoop-yarn /usr/lib/hadoop-mapreduce /usr/lib/hadoop-hdfs /usr/lib/hadoop

export MASTER_PRIVATE_IP=<MASTER_NODE_IP_ADDRESS>
export PEM_FILE=/home/centos/.ssh/id_rsa
scp -i $PEM_FILE hadoop@$MASTER_PRIVATE_IP:/etc/hadoop/conf/core-site.xml /etc/hadoop/conf
scp -i $PEM_FILE hadoop@$MASTER_PRIVATE_IP:/etc/hadoop/conf/yarn-site.xml /etc/hadoop/conf
scp -i $PEM_FILE hadoop@$MASTER_PRIVATE_IP:/etc/hadoop/conf/hdfs-site.xml /etc/hadoop/conf
scp -i $PEM_FILE hadoop@$MASTER_PRIVATE_IP:/etc/hadoop/conf/mapred-site.xml /etc/hadoop/conf

rsync -avz --delete -e "ssh -o StrictHostKeyChecking=no -o ServerAliveInterval=10 -i $PEM_FILE" hadoop@$MASTER_PRIVATE_IP:'/usr/lib/spark/*' /usr/lib/spark
rsync -avz --delete -e "ssh -o StrictHostKeyChecking=no -o ServerAliveInterval=10 -i $PEM_FILE" hadoop@$MASTER_PRIVATE_IP:'/usr/lib/sqoop/*' /usr/lib/sqoop
rsync -avz --delete -e "ssh -o StrictHostKeyChecking=no -o ServerAliveInterval=10 -i $PEM_FILE" hadoop@$MASTER_PRIVATE_IP:'/usr/lib/hadoop/*' /usr/lib/hadoop
rsync -avz --delete -e "ssh -o StrictHostKeyChecking=no -o ServerAliveInterval=10 -i $PEM_FILE" hadoop@$MASTER_PRIVATE_IP:'/usr/lib/hadoop-yarn/*' /usr/lib/hadoop-yarn
rsync -avz --delete -e "ssh -o StrictHostKeyChecking=no -o ServerAliveInterval=10 -i $PEM_FILE" hadoop@$MASTER_PRIVATE_IP:'/usr/lib/hadoop-mapreduce/*' /usr/lib/hadoop-mapreduce
rsync -avz --delete -e "ssh -o StrictHostKeyChecking=no -o ServerAliveInterval=10 -i $PEM_FILE" hadoop@$MASTER_PRIVATE_IP:'/usr/lib/hadoop-hdfs/*' /usr/lib/hadoop-hdfs
rsync -avz --delete -e "ssh -o StrictHostKeyChecking=no -o ServerAliveInterval=10 -i $PEM_FILE" hadoop@$MASTER_PRIVATE_IP:'/usr/share/aws/*' /usr/share/aws

rsync -avz --delete -e "ssh -o StrictHostKeyChecking=no -o ServerAliveInterval=10 -i $PEM_FILE" hadoop@$MASTER_PRIVATE_IP:'/etc/spark/conf/*' /etc/spark/conf

echo "spark.hadoop.yarn.timeline-service.enabled false" >> /etc/spark/conf/spark-defaults.conf

由于版本可能不同,您可能需要在主节点上为该文件执行 ls
scp -o StrictHostKeyChecking=no -o ServerAliveInterval=10 -i $PEM_FILE hadoop@$MASTER_PRIVATE_IP:/usr/lib/hive-hcatalog/share/hcatalog/hive-hcatalog-core-2.3.3-amzn-1.jar /usr/lib/hive-webhcat/share/hcatalog/hive-hcatalog-core.jar

您应该 ls 来验证 JAR 路径
ls /usr/lib/spark/examples/jars/spark-examples_ <HIT TAB>
spark-submit --class org.apache.spark.examples.SparkPi --master yarn --driver-memory 512m --executor-memory 512m --executor-cores 1 /usr/lib/spark/examples/jars/spark-examples_2.11-2.3.1.jar 10

检查 Yarn UI 以验证它是否成功
http://<MASTER_NODE>:8088/cluster

关于amazon-ec2 - 为 aws emr 创建边缘节点的步骤,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/46245534/

相关文章:

git - 在 AWS 上部署 Flask 应用程序,无需停机

scala - s3中的数据分区

ssl - 如何通过端口 8443 上的 https 运行 tomcat7 网络应用程序?

ubuntu - 使用 AWS EC2 保持与 Twitter API 的持久连接?即使关闭终端窗口?

hadoop - 时间戳字段在 EMR 上的 Presto 0.170 中显示 1970-01-01

hadoop - 如何在 EMR 上重命名 Hive 的输出文件?

java - Amazon EMR - Java SDK - 如何获取作业结果

scala - 有适用于 AWS 的 Scala SDK 或接口(interface)吗?

amazon-ec2 - 使用 OpenStack Nova API 以编程方式设置实例名称

amazon-web-services - AWS Lambda 函数无法从 EC2 实例 : SequelizeConnectionError connect ETIMEDOUT 访问 MySQL 数据库