python - 创建 step spark python, amazon hadoop

标签 python hadoop hive pyspark amazon-emr

我正在 Amazon 上使用 Hadoop 创建一个 Spark 步骤,但我一直在思考。不是因为我的代码不好或发送错误的判断,而是找不到出路。

我传递代码

spark-submit --deploy-mode cluster --master yarn --num-executors 5 --executor-cores 5 --executor-memory 1g s3://URL-S3/scripts/test.py

脚本:

import boto3

dynamodb = boto3.resource('dynamodb')
table = dynamodb.Table('TestSpark')

table.put_item(
   Item={
        'app_token': "1a",
        'advertising_id': "1b",
    }
)

我一直回来

16/08/25 07:06:22 INFO Client: Application report for application_1472106590712_0002 (state: ACCEPTED)
16/08/25 07:06:23 INFO Client: Application report for application_1472106590712_0002 (state: ACCEPTED)
16/08/25 07:06:24 INFO Client: Application report for application_1472106590712_0002 (state: ACCEPTED)
16/08/25 07:06:25 INFO Client: Application report for application_1472106590712_0002 (state: ACCEPTED)
16/08/25 07:06:26 INFO Client: Application report for application_1472106590712_0002 (state: ACCEPTED)
16/08/25 07:06:27 INFO Client: Application report for application_1472106590712_0002 (state: ACCEPTED)
16/08/25 07:06:28 INFO Client: Application report for application_1472106590712_0002 (state: ACCEPTED)
16/08/25 07:06:29 INFO Client: Application report for application_1472106590712_0002 (state: ACCEPTED)
16/08/25 07:06:30 INFO Client: Application report for application_1472106590712_0002 (state: ACCEPTED)
16/08/25 07:06:31 INFO Client: Application report for application_1472106590712_0002 (state: ACCEPTED)
16/08/25 07:06:32 INFO Client: Application report for application_1472106590712_0002 (state: ACCEPTED)
16/08/25 07:06:33 INFO Client: Application report for application_1472106590712_0002 (state: ACCEPTED)
16/08/25 07:06:34 INFO Client: Application report for application_1472106590712_0002 (state: ACCEPTED)
16/08/25 07:06:35 INFO Client: Application report for application_1472106590712_0002 (state: ACCEPTED)
16/08/25 07:06:36 INFO Client: Application report for application_1472106590712_0002 (state: ACCEPTED)
16/08/25 07:06:37 INFO Client: Application report for application_1472106590712_0002 (state: ACCEPTED)
16/08/25 07:06:38 INFO Client: Application report for application_1472106590712_0002 (state: ACCEPTED)
16/08/25 07:06:39 INFO Client: Application report for application_1472106590712_0002 (state: ACCEPTED)
16/08/25 07:06:40 INFO Client: Application report for application_1472106590712_0002 (state: ACCEPTED)
16/08/25 07:06:41 INFO Client: Application report for application_1472106590712_0002 (state: ACCEPTED)
16/08/25 07:06:42 INFO Client: Application report for application_1472106590712_0002 (state: ACCEPTED)

错误日志:

2016-08-25T07:30:14.769Z INFO Step created jobs: 
2016-08-25T07:30:14.769Z WARN Step failed with exitCode 1 and took 1062 seconds

谢谢!

这已经是错误了,但是模块和安装之前。

ImportError: No module named boto3

最佳答案

您的应用程序正在等待 yarn 资源。转到资源管理器 URL 并查看您是否有足够的资源并使用正确的队列。如果您查看 yarn resourcemanager 日志就会知道原因。

关于python - 创建 step spark python, amazon hadoop,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/39139057/

相关文章:

python - 将元组与非元组列表进行比较

hadoop - hive/hdfs 移动数据未按预期工作

hadoop - Oozie 安装 - oozied.sh 启动时出错

java - 显示使用 java 在配置单元中创建表

python - 如何从字符串或列表中读取配置?

python - 从 python 中的字符串中提取特定模式

python - 在指定时间内查找所有排列匹配

mysql - 在索引到 solr 之前预处理数据

hive - "WITH SERDEPROPERTIES ( ' paths' = 'key1, key2, key3' )"在 Hive DDL json serde 中到底做了什么?

sql - Hive 和 SparkSQL 不支持日期时间类型?