api - Spark 作业已提交 - 正在等待(TaskSchedulerImpl : Initial job not accepted)

标签 api apache-spark amazon-ec2

调用 API 来提交作业。响应状态 - 正在运行

在集群 UI 上 -

Worker (slave) - worker-20160712083825-172.31.17.189-59433 is Alive

Core 1 out of 2 used

Memory 1Gb out of 6 used

运行应用程序

app-20160713130056-0020 - Waiting since 5hrs

Cores - unlimited

应用程序的工作描述

活跃阶段

reduceByKey at /root/wordcount.py:23

待定阶段

takeOrdered at /root/wordcount.py:26

运行驱动程序 -

stderr log page for driver-20160713130051-0025 

WARN scheduler.TaskSchedulerImpl: Initial job has not accepted any resources; check your cluster UI to ensure that workers are registered and have sufficient resources

根据Initial job has not accepted any resources; check your cluster UI to ensure that workers are registered and have sufficient resources 从站尚未启动 - 因此它没有资源。

但是就我而言 - Slave 1 正在工作

根据Unable to Execute More than a spark Job "Initial job has not accepted any resources" 我正在使用部署模式=集群(不是客户端),因为我有1个主站1个从站,并且提交API正在通过Postman/任何地方调用

集群还有可用的核心、RAM、内存 - Still Job 会抛出错误 由 UI 传达

根据TaskSchedulerImpl: Initial job has not accepted any resources; 我分配了

~/spark-1.5.0/conf/spark-env.sh

Spark环境变量

SPARK_WORKER_INSTANCES=1
SPARK_WORKER_MEMORY=1000m
SPARK_WORKER_CORES=2

在从站中复制这些

sudo /root/spark-ec2/copy-dir /root/spark/conf/spark-env.sh

上述问题答案中的所有情况均适用,但仍没有找到解决方案。因此,因为我正在使用 API 和 Apache SPark - 也许需要一些其他帮助。

Edited July 18,2016

Wordcount.py - My PySpark application code -

from pyspark import SparkContext, SparkConf

logFile = "/user/root/In/a.txt"

conf = (SparkConf().set("num-executors", "1"))

sc = SparkContext(master = "spark://ec2-54-209-108-127.compute-1.amazonaws.com:7077", appName = "MyApp", conf = conf)
print("in here")
lines = sc.textFile(logFile)
print("text read")
c = lines.count()
print("lines counted")

错误

Starting job: count at /root/wordcount.py:11
16/07/18 07:46:39 INFO scheduler.DAGScheduler: Got job 0 (count at /root/wordcount.py:11) with 2 output partitions
16/07/18 07:46:39 INFO scheduler.DAGScheduler: Final stage: ResultStage 0 (count at /root/wordcount.py:11)
16/07/18 07:46:39 INFO scheduler.DAGScheduler: Parents of final stage: List()
16/07/18 07:46:39 INFO scheduler.DAGScheduler: Missing parents: List()
16/07/18 07:46:39 INFO scheduler.DAGScheduler: Submitting ResultStage 0 (PythonRDD[2] at count at /root/wordcount.py:11), which has no missing parents
16/07/18 07:46:39 INFO storage.MemoryStore: Block broadcast_1 stored as values in memory (estimated size 5.6 KB, free 56.2 KB)
16/07/18 07:46:39 INFO storage.MemoryStore: Block broadcast_1_piece0 stored as bytes in memory (estimated size 3.4 KB, free 59.7 KB)
16/07/18 07:46:39 INFO storage.BlockManagerInfo: Added broadcast_1_piece0 in memory on 172.31.17.189:43684 (size: 3.4 KB, free: 511.5 MB)
16/07/18 07:46:39 INFO spark.SparkContext: Created broadcast 1 from broadcast at DAGScheduler.scala:1006
16/07/18 07:46:39 INFO scheduler.DAGScheduler: Submitting 2 missing tasks from ResultStage 0 (PythonRDD[2] at count at /root/wordcount.py:11)
16/07/18 07:46:39 INFO scheduler.TaskSchedulerImpl: Adding task set 0.0 with 2 tasks
16/07/18 07:46:54 WARN scheduler.TaskSchedulerImpl: Initial job has not accepted any resources; check your cluster UI to ensure that workers are registered and have sufficient resources

根据Spark UI showing 0 cores even when setting cores in App ,

Spark WebUI 声明使用了零个核心并且无限期等待没有任务运行。该应用程序在运行时或核心期间也不使用任何内存,并且在启动时立即进入等待状态

Spark 版本 1.6.1 乌类图 亚马逊EC2

最佳答案

我也有同样的问题。以下是我在发生这种情况时的评论。

1:17:46 WARN TaskSchedulerImpl: Initial job has not accepted any resources; check your cluster UI to ensure that workers are registered and have sufficient resources

我注意到它只发生在 scala shell 的第一次查询期间,我在其中运行从 hdfs 获取数据的操作。

出现问题时,WebUI 会显示没有任何正在运行的应用程序。

URL: spark://spark1:7077
REST URL: spark://spark1:6066 (cluster mode)
Alive Workers: 4
Cores in use: 26 Total, 26 Used
Memory in use: 52.7 GB Total, 4.0 GB Used
Applications: 0 Running, 0 Completed
Drivers: 0 Running, 0 Completed 
Status: ALIVE

好像有什么东西无法启动,我无法确切地分辨出是哪一个。

但是,第二次重新启动集群会将应用程序值设置为 1 一切正常。

URL: spark://spark1:7077
REST URL: spark://spark1:6066 (cluster mode)
Alive Workers: 4
Cores in use: 26 Total, 26 Used
Memory in use: 52.7 GB Total, 4.0 GB Used
Applications: 1 Running, 0 Completed
Drivers: 0 Running, 0 Completed
Status: ALIVE

我仍在调查,这种快速解决方法可以节省最终解决方案的时间。

关于api - Spark 作业已提交 - 正在等待(TaskSchedulerImpl : Initial job not accepted),我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/38359801/

相关文章:

Python:解析 thorn 分隔文件 - 代码适用于 Windows,但不适用于 Linux?

deployment - 删除了自定义 Amazon Beanstalk AMI ID

azure - Azure 静态 Web 应用中的 Azure Function 的 CORS

java - Http 请求不接受参数中的空格并抛出 ClientProtocolException

apache-spark - var hFile = sc.textFile ("hdfs://localhost:9000/ex1/cen.csv") 输入路径不存在错误

hadoop - 问题 : Scala code in Spark shell to retrieve data from Hbase

apache-spark - 如何将具有 (key1, list(key2, value)) 结构的列表转换为 pyspark 中的数据帧?

rest - MS VSTS Rest API 的 : Get List of Attachments for a given item

javascript - 如何使用媒体源 api 将两个视频文件数据附加到源缓冲区?

amazon-web-services - AWS 负载均衡器 ELB 的 ssl 终止是否安全?