java - 独立集群中的 Spark 动态分配使我的应用程序失败

我正在 Spark 1.5.2 Build Over Scala 2.11 On Windows 7 在独立模式 和两个 Spark 应用程序上运行，所有内核都分配给两个应用程序.

Dynamic Allocation is Enabled From Configuration

当我运行我的应用程序的两个实例时，没有一个应用程序完成。

这是我的申请 -

package com.cleartrail.clearinsight.spark;

import java.util.Arrays;
import java.util.List;

import org.apache.spark.SparkConf;
import org.apache.spark.SparkContext;
import org.apache.spark.api.java.JavaPairRDD;
import org.apache.spark.api.java.JavaRDD;
import org.apache.spark.api.java.JavaSparkContext;

public class Main {

    public static void main(String[] args) {

        SparkConf conf = new SparkConf().setAppName("SparkPOC").setMaster("spark://172.50.33.159:7077").set("spark.dynamicAllocation.enabled", "true")
                .set("spark.shuffle.service.enabled", "true").set("spark.dynamicAllocation.executorIdleTimeout", "10s");// .set("spark.cores.max", "2");
        // SparkConf conf = new SparkConf().setAppName("SparkPOC").setMaster("local");

        SparkContext sparkContext = SparkContext.getOrCreate(conf);
        JavaSparkContext sc = new JavaSparkContext(sparkContext);
        sc.addJar("./target/spark-poc-0.0.1-SNAPSHOT.jar");
        try {
            List<Integer> data = Arrays.asList(1, 2, 3, 4, 5);
            JavaRDD<Integer> distData = sc.parallelize(data);
            System.out.println(distData.count());
        }
        finally {
            sc.close();
        }
    }
}

控制台错误 -

Using Spark's default log4j profile: org/apache/spark/log4j-defaults.properties
16/06/10 14:20:03 INFO SparkContext: Running Spark version 1.5.2
16/06/10 14:20:03 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
16/06/10 14:20:03 INFO SecurityManager: Changing view acls to: pranjal.jaju
16/06/10 14:20:03 INFO SecurityManager: Changing modify acls to: pranjal.jaju
16/06/10 14:20:03 INFO SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users with view permissions: Set(pranjal.jaju); users with modify permissions: Set(pranjal.jaju)
16/06/10 14:20:04 INFO Slf4jLogger: Slf4jLogger started
16/06/10 14:20:04 INFO Remoting: Starting remoting
16/06/10 14:20:04 INFO Remoting: Remoting started; listening on addresses :[akka.tcp://sparkDriver@172.50.33.159:58257]
16/06/10 14:20:04 INFO Utils: Successfully started service 'sparkDriver' on port 58257.
16/06/10 14:20:04 INFO SparkEnv: Registering MapOutputTracker
16/06/10 14:20:04 INFO SparkEnv: Registering BlockManagerMaster
16/06/10 14:20:04 INFO DiskBlockManager: Created local directory at C:\Users\pranjal.jaju\AppData\Local\Temp\blockmgr-99f1ab11-9aff-40f3-b2b5-3c992972657c
16/06/10 14:20:04 INFO MemoryStore: MemoryStore started with capacity 965.8 MB
16/06/10 14:20:04 INFO HttpFileServer: HTTP File server directory is C:\Users\pranjal.jaju\AppData\Local\Temp\spark-bcb3998e-0cd5-4146-9a01-ec237325fc5f\httpd-ffd9b538-3ebb-47ef-9005-4c66383466b0
16/06/10 14:20:04 INFO HttpServer: Starting HTTP Server
16/06/10 14:20:04 INFO Utils: Successfully started service 'HTTP file server' on port 58259.
16/06/10 14:20:05 INFO SparkEnv: Registering OutputCommitCoordinator
16/06/10 14:20:05 INFO Utils: Successfully started service 'SparkUI' on port 4040.
16/06/10 14:20:05 INFO SparkUI: Started SparkUI at http://172.50.33.159:4040
16/06/10 14:20:05 WARN MetricsSystem: Using default name DAGScheduler for source because spark.app.id is not set.
16/06/10 14:20:05 INFO AppClient$ClientEndpoint: Connecting to master spark://172.50.33.159:7077...
16/06/10 14:20:05 INFO SparkDeploySchedulerBackend: Connected to Spark cluster with app ID app-20160610142005-0004
16/06/10 14:20:05 INFO AppClient$ClientEndpoint: Executor added: app-20160610142005-0004/0 on worker-20160610141809-172.50.33.159-57648 (172.50.33.159:57648) with 4 cores
16/06/10 14:20:05 INFO SparkDeploySchedulerBackend: Granted executor ID app-20160610142005-0004/0 on hostPort 172.50.33.159:57648 with 4 cores, 1024.0 MB RAM
16/06/10 14:20:05 INFO AppClient$ClientEndpoint: Executor added: app-20160610142005-0004/1 on worker-20160610141326-172.50.33.159-56922 (172.50.33.159:56922) with 4 cores
16/06/10 14:20:05 INFO SparkDeploySchedulerBackend: Granted executor ID app-20160610142005-0004/1 on hostPort 172.50.33.159:56922 with 4 cores, 1024.0 MB RAM
16/06/10 14:20:05 INFO AppClient$ClientEndpoint: Executor updated: app-20160610142005-0004/0 is now LOADING
16/06/10 14:20:05 INFO AppClient$ClientEndpoint: Executor updated: app-20160610142005-0004/0 is now RUNNING
16/06/10 14:20:05 INFO AppClient$ClientEndpoint: Executor updated: app-20160610142005-0004/1 is now RUNNING
16/06/10 14:20:05 INFO AppClient$ClientEndpoint: Executor updated: app-20160610142005-0004/1 is now LOADING
16/06/10 14:20:06 INFO Utils: Successfully started service 'org.apache.spark.network.netty.NettyBlockTransferService' on port 58279.
16/06/10 14:20:06 INFO NettyBlockTransferService: Server created on 58279
16/06/10 14:20:06 INFO BlockManagerMaster: Trying to register BlockManager
16/06/10 14:20:06 INFO BlockManagerMasterEndpoint: Registering block manager 172.50.33.159:58279 with 965.8 MB RAM, BlockManagerId(driver, 172.50.33.159, 58279)
16/06/10 14:20:06 INFO BlockManagerMaster: Registered BlockManager
16/06/10 14:20:06 INFO SparkDeploySchedulerBackend: SchedulerBackend is ready for scheduling beginning after reached minRegisteredResourcesRatio: 0.0
16/06/10 14:20:07 INFO SparkContext: Added JAR ./target/spark-poc-0.0.1-SNAPSHOT.jar at http://172.50.33.159:58259/jars/spark-poc-0.0.1-SNAPSHOT.jar with timestamp 1465548607657
16/06/10 14:20:11 INFO SparkDeploySchedulerBackend: Registered executor: AkkaRpcEndpointRef(Actor[akka.tcp://sparkExecutor@172.50.33.159:58341/user/Executor#-804456045]) with ID 1
16/06/10 14:20:11 INFO ExecutorAllocationManager: New executor 1 has registered (new total is 1)
16/06/10 14:20:11 INFO SparkDeploySchedulerBackend: Registered executor: AkkaRpcEndpointRef(Actor[akka.tcp://sparkExecutor@172.50.33.159:58340/user/Executor#-859540224]) with ID 0
16/06/10 14:20:11 INFO ExecutorAllocationManager: New executor 0 has registered (new total is 2)
16/06/10 14:20:12 INFO BlockManagerMasterEndpoint: Registering block manager 172.50.33.159:58379 with 530.0 MB RAM, BlockManagerId(1, 172.50.33.159, 58379)
16/06/10 14:20:12 INFO BlockManagerMasterEndpoint: Registering block manager 172.50.33.159:58390 with 530.0 MB RAM, BlockManagerId(0, 172.50.33.159, 58390)
16/06/10 14:20:21 INFO SparkDeploySchedulerBackend: Requesting to kill executor(s) 1
16/06/10 14:20:21 INFO ExecutorAllocationManager: Removing executor 1 because it has been idle for 10 seconds (new desired total will be 1)
16/06/10 14:20:21 ERROR TaskSchedulerImpl: Lost executor 1 on 172.50.33.159: remote Rpc client disassociated
16/06/10 14:20:21 WARN ReliableDeliverySupervisor: Association with remote system [akka.tcp://sparkExecutor@172.50.33.159:58341] has failed, address is now gated for [5000] ms. Reason: [Disassociated] 
16/06/10 14:20:21 INFO ExecutorAllocationManager: Existing executor 1 has been removed (new total is 1)
16/06/10 14:20:21 INFO DAGScheduler: Executor lost: 1 (epoch 0)
16/06/10 14:20:21 INFO BlockManagerMasterEndpoint: Trying to remove executor 1 from BlockManagerMaster.
16/06/10 14:20:21 INFO BlockManagerMasterEndpoint: Removing block manager BlockManagerId(1, 172.50.33.159, 58379)
16/06/10 14:20:21 INFO BlockManagerMaster: Removed 1 successfully in removeExecutor
16/06/10 14:20:21 INFO SparkDeploySchedulerBackend: Requesting to kill executor(s) 0
16/06/10 14:20:21 INFO ExecutorAllocationManager: Removing executor 0 because it has been idle for 10 seconds (new desired total will be 0)
16/06/10 14:20:21 ERROR TaskSchedulerImpl: Lost executor 0 on 172.50.33.159: remote Rpc client disassociated
16/06/10 14:20:21 WARN ReliableDeliverySupervisor: Association with remote system [akka.tcp://sparkExecutor@172.50.33.159:58340] has failed, address is now gated for [5000] ms. Reason: [Disassociated] 
16/06/10 14:20:21 INFO ExecutorAllocationManager: Existing executor 0 has been removed (new total is 0)
16/06/10 14:20:21 INFO DAGScheduler: Executor lost: 0 (epoch 0)
16/06/10 14:20:21 INFO BlockManagerMasterEndpoint: Trying to remove executor 0 from BlockManagerMaster.
16/06/10 14:20:21 INFO BlockManagerMasterEndpoint: Removing block manager BlockManagerId(0, 172.50.33.159, 58390)
16/06/10 14:20:21 INFO BlockManagerMaster: Removed 0 successfully in removeExecutor
16/06/10 14:20:30 INFO SparkContext: Starting job: count at Main.java:26
16/06/10 14:20:30 INFO DAGScheduler: Got job 0 (count at Main.java:26) with 2 output partitions
16/06/10 14:20:30 INFO DAGScheduler: Final stage: ResultStage 0(count at Main.java:26)
16/06/10 14:20:30 INFO DAGScheduler: Parents of final stage: List()
16/06/10 14:20:30 INFO DAGScheduler: Missing parents: List()
16/06/10 14:20:30 INFO DAGScheduler: Submitting ResultStage 0 (ParallelCollectionRDD[0] at parallelize at Main.java:25), which has no missing parents
16/06/10 14:20:31 INFO MemoryStore: ensureFreeSpace(1384) called with curMem=0, maxMem=1012704215
16/06/10 14:20:31 INFO MemoryStore: Block broadcast_0 stored as values in memory (estimated size 1384.0 B, free 965.8 MB)
16/06/10 14:20:31 INFO MemoryStore: ensureFreeSpace(942) called with curMem=1384, maxMem=1012704215
16/06/10 14:20:31 INFO MemoryStore: Block broadcast_0_piece0 stored as bytes in memory (estimated size 942.0 B, free 965.8 MB)
16/06/10 14:20:31 INFO BlockManagerInfo: Added broadcast_0_piece0 in memory on 172.50.33.159:58279 (size: 942.0 B, free: 965.8 MB)
16/06/10 14:20:31 INFO SparkContext: Created broadcast 0 from broadcast at DAGScheduler.scala:861
16/06/10 14:20:31 INFO DAGScheduler: Submitting 2 missing tasks from ResultStage 0 (ParallelCollectionRDD[0] at parallelize at Main.java:25)
16/06/10 14:20:31 INFO TaskSchedulerImpl: Adding task set 0.0 with 2 tasks
16/06/10 14:20:32 INFO AppClient$ClientEndpoint: Executor added: app-20160610142005-0004/2 on worker-20160610141809-172.50.33.159-57648 (172.50.33.159:57648) with 4 cores
16/06/10 14:20:32 INFO ExecutorAllocationManager: Requesting 1 new executor because tasks are backlogged (new desired total will be 1)
16/06/10 14:20:32 INFO SparkDeploySchedulerBackend: Granted executor ID app-20160610142005-0004/2 on hostPort 172.50.33.159:57648 with 4 cores, 1024.0 MB RAM
16/06/10 14:20:32 INFO AppClient$ClientEndpoint: Executor updated: app-20160610142005-0004/2 is now LOADING
16/06/10 14:20:32 INFO AppClient$ClientEndpoint: Executor updated: app-20160610142005-0004/2 is now RUNNING
16/06/10 14:20:33 INFO AppClient$ClientEndpoint: Executor added: app-20160610142005-0004/3 on worker-20160610141326-172.50.33.159-56922 (172.50.33.159:56922) with 3 cores
16/06/10 14:20:33 INFO SparkDeploySchedulerBackend: Granted executor ID app-20160610142005-0004/3 on hostPort 172.50.33.159:56922 with 3 cores, 1024.0 MB RAM
16/06/10 14:20:33 INFO ExecutorAllocationManager: Requesting 1 new executor because tasks are backlogged (new desired total will be 2)
16/06/10 14:20:33 INFO AppClient$ClientEndpoint: Executor updated: app-20160610142005-0004/3 is now RUNNING
16/06/10 14:20:33 INFO AppClient$ClientEndpoint: Executor updated: app-20160610142005-0004/3 is now LOADING
16/06/10 14:20:36 INFO SparkDeploySchedulerBackend: Registered executor: AkkaRpcEndpointRef(Actor[akka.tcp://sparkExecutor@172.50.33.159:58510/user/Executor#1976924864]) with ID 2
16/06/10 14:20:36 INFO ExecutorAllocationManager: New executor 2 has registered (new total is 1)
16/06/10 14:20:36 INFO TaskSetManager: Starting task 0.0 in stage 0.0 (TID 0, 172.50.33.159, PROCESS_LOCAL, 2136 bytes)
16/06/10 14:20:36 INFO TaskSetManager: Starting task 1.0 in stage 0.0 (TID 1, 172.50.33.159, PROCESS_LOCAL, 2146 bytes)
16/06/10 14:20:36 INFO BlockManagerMasterEndpoint: Registering block manager 172.50.33.159:58543 with 530.0 MB RAM, BlockManagerId(2, 172.50.33.159, 58543)
16/06/10 14:20:37 INFO SparkDeploySchedulerBackend: Registered executor: AkkaRpcEndpointRef(Actor[akka.tcp://sparkExecutor@172.50.33.159:58565/user/Executor#540501534]) with ID 3
16/06/10 14:20:37 INFO ExecutorAllocationManager: New executor 3 has registered (new total is 2)
16/06/10 14:20:37 INFO BlockManagerMasterEndpoint: Registering block manager 172.50.33.159:58581 with 530.0 MB RAM, BlockManagerId(3, 172.50.33.159, 58581)
16/06/10 14:20:47 INFO SparkDeploySchedulerBackend: Requesting to kill executor(s) 3
16/06/10 14:20:47 INFO ExecutorAllocationManager: Removing executor 3 because it has been idle for 10 seconds (new desired total will be 1)
16/06/10 14:20:47 ERROR TaskSchedulerImpl: Lost executor 3 on 172.50.33.159: remote Rpc client disassociated
16/06/10 14:20:47 WARN ReliableDeliverySupervisor: Association with remote system [akka.tcp://sparkExecutor@172.50.33.159:58565] has failed, address is now gated for [5000] ms. Reason: [Disassociated] 
16/06/10 14:20:47 INFO TaskSetManager: Re-queueing tasks for 3 from TaskSet 0.0
16/06/10 14:20:47 INFO DAGScheduler: Executor lost: 3 (epoch 0)
16/06/10 14:20:47 INFO BlockManagerMasterEndpoint: Trying to remove executor 3 from BlockManagerMaster.
16/06/10 14:20:47 INFO BlockManagerMasterEndpoint: Removing block manager BlockManagerId(3, 172.50.33.159, 58581)
16/06/10 14:20:47 INFO BlockManagerMaster: Removed 3 successfully in removeExecutor
16/06/10 14:20:47 INFO ExecutorAllocationManager: Existing executor 3 has been removed (new total is 1)
16/06/10 14:20:50 ERROR TaskSchedulerImpl: Lost executor 2 on 172.50.33.159: remote Rpc client disassociated
16/06/10 14:20:50 INFO TaskSetManager: Re-queueing tasks for 2 from TaskSet 0.0
16/06/10 14:20:50 WARN ReliableDeliverySupervisor: Association with remote system [akka.tcp://sparkExecutor@172.50.33.159:58510] has failed, address is now gated for [5000] ms. Reason: [Disassociated] 
16/06/10 14:20:50 WARN TaskSetManager: Lost task 1.0 in stage 0.0 (TID 1, 172.50.33.159): ExecutorLostFailure (executor 2 lost)
16/06/10 14:20:50 WARN TaskSetManager: Lost task 0.0 in stage 0.0 (TID 0, 172.50.33.159): ExecutorLostFailure (executor 2 lost)
16/06/10 14:20:50 INFO DAGScheduler: Executor lost: 2 (epoch 0)
16/06/10 14:20:50 INFO BlockManagerMasterEndpoint: Trying to remove executor 2 from BlockManagerMaster.
16/06/10 14:20:50 INFO BlockManagerMasterEndpoint: Removing block manager BlockManagerId(2, 172.50.33.159, 58543)
16/06/10 14:20:50 INFO BlockManagerMaster: Removed 2 successfully in removeExecutor
16/06/10 14:20:50 INFO ExecutorAllocationManager: Existing executor 2 has been removed (new total is 0)
16/06/10 14:20:50 INFO AppClient$ClientEndpoint: Executor updated: app-20160610142005-0004/2 is now EXITED (Command exited with code 1)
16/06/10 14:20:50 INFO SparkDeploySchedulerBackend: Executor app-20160610142005-0004/2 removed: Command exited with code 1
16/06/10 14:20:50 INFO SparkDeploySchedulerBackend: Asked to remove non-existent executor 2
16/06/10 14:20:50 INFO AppClient$ClientEndpoint: Executor added: app-20160610142005-0004/4 on worker-20160610141809-172.50.33.159-57648 (172.50.33.159:57648) with 4 cores
16/06/10 14:20:50 INFO SparkDeploySchedulerBackend: Granted executor ID app-20160610142005-0004/4 on hostPort 172.50.33.159:57648 with 4 cores, 1024.0 MB RAM
16/06/10 14:20:50 INFO AppClient$ClientEndpoint: Executor updated: app-20160610142005-0004/4 is now LOADING
16/06/10 14:20:50 INFO AppClient$ClientEndpoint: Executor updated: app-20160610142005-0004/4 is now RUNNING
16/06/10 14:20:53 INFO SparkDeploySchedulerBackend: Registered executor: AkkaRpcEndpointRef(Actor[akka.tcp://sparkExecutor@172.50.33.159:58657/user/Executor#-1698957872]) with ID 4
16/06/10 14:20:53 INFO TaskSetManager: Starting task 0.1 in stage 0.0 (TID 2, 172.50.33.159, PROCESS_LOCAL, 2136 bytes)
16/06/10 14:20:53 INFO TaskSetManager: Starting task 1.1 in stage 0.0 (TID 3, 172.50.33.159, PROCESS_LOCAL, 2146 bytes)
16/06/10 14:20:53 INFO ExecutorAllocationManager: New executor 4 has registered (new total is 1)
16/06/10 14:20:53 INFO BlockManagerMasterEndpoint: Registering block manager 172.50.33.159:58677 with 530.0 MB RAM, BlockManagerId(4, 172.50.33.159, 58677)
16/06/10 14:21:07 ERROR TaskSchedulerImpl: Lost executor 4 on 172.50.33.159: remote Rpc client disassociated
16/06/10 14:21:07 INFO TaskSetManager: Re-queueing tasks for 4 from TaskSet 0.0
16/06/10 14:21:07 WARN TaskSetManager: Lost task 0.1 in stage 0.0 (TID 2, 172.50.33.159): ExecutorLostFailure (executor 4 lost)
16/06/10 14:21:07 WARN TaskSetManager: Lost task 1.1 in stage 0.0 (TID 3, 172.50.33.159): ExecutorLostFailure (executor 4 lost)
16/06/10 14:21:07 INFO DAGScheduler: Executor lost: 4 (epoch 0)
16/06/10 14:21:07 INFO BlockManagerMasterEndpoint: Trying to remove executor 4 from BlockManagerMaster.
16/06/10 14:21:07 INFO BlockManagerMasterEndpoint: Removing block manager BlockManagerId(4, 172.50.33.159, 58677)
16/06/10 14:21:07 INFO BlockManagerMaster: Removed 4 successfully in removeExecutor
16/06/10 14:21:07 WARN ReliableDeliverySupervisor: Association with remote system [akka.tcp://sparkExecutor@172.50.33.159:58657] has failed, address is now gated for [5000] ms. Reason: [Disassociated] 
16/06/10 14:21:07 INFO ExecutorAllocationManager: Existing executor 4 has been removed (new total is 0)
16/06/10 14:21:07 INFO AppClient$ClientEndpoint: Executor updated: app-20160610142005-0004/4 is now EXITED (Command exited with code 1)
16/06/10 14:21:07 INFO SparkDeploySchedulerBackend: Executor app-20160610142005-0004/4 removed: Command exited with code 1
16/06/10 14:21:07 INFO SparkDeploySchedulerBackend: Asked to remove non-existent executor 4
16/06/10 14:21:07 INFO AppClient$ClientEndpoint: Executor added: app-20160610142005-0004/5 on worker-20160610141809-172.50.33.159-57648 (172.50.33.159:57648) with 4 cores
16/06/10 14:21:07 INFO SparkDeploySchedulerBackend: Granted executor ID app-20160610142005-0004/5 on hostPort 172.50.33.159:57648 with 4 cores, 1024.0 MB RAM
16/06/10 14:21:07 INFO AppClient$ClientEndpoint: Executor updated: app-20160610142005-0004/5 is now LOADING
16/06/10 14:21:07 INFO AppClient$ClientEndpoint: Executor updated: app-20160610142005-0004/5 is now RUNNING
16/06/10 14:21:10 INFO SparkDeploySchedulerBackend: Registered executor: AkkaRpcEndpointRef(Actor[akka.tcp://sparkExecutor@172.50.33.159:58754/user/Executor#-457099261]) with ID 5
16/06/10 14:21:10 INFO TaskSetManager: Starting task 1.2 in stage 0.0 (TID 4, 172.50.33.159, PROCESS_LOCAL, 2146 bytes)
16/06/10 14:21:10 INFO TaskSetManager: Starting task 0.2 in stage 0.0 (TID 5, 172.50.33.159, PROCESS_LOCAL, 2136 bytes)
16/06/10 14:21:10 INFO ExecutorAllocationManager: New executor 5 has registered (new total is 1)
16/06/10 14:21:10 INFO BlockManagerMasterEndpoint: Registering block manager 172.50.33.159:58773 with 530.0 MB RAM, BlockManagerId(5, 172.50.33.159, 58773)
16/06/10 14:21:23 ERROR TaskSchedulerImpl: Lost executor 5 on 172.50.33.159: remote Rpc client disassociated
16/06/10 14:21:23 INFO TaskSetManager: Re-queueing tasks for 5 from TaskSet 0.0
16/06/10 14:21:23 WARN TaskSetManager: Lost task 0.2 in stage 0.0 (TID 5, 172.50.33.159): ExecutorLostFailure (executor 5 lost)
16/06/10 14:21:23 WARN TaskSetManager: Lost task 1.2 in stage 0.0 (TID 4, 172.50.33.159): ExecutorLostFailure (executor 5 lost)
16/06/10 14:21:23 INFO DAGScheduler: Executor lost: 5 (epoch 0)
16/06/10 14:21:23 INFO BlockManagerMasterEndpoint: Trying to remove executor 5 from BlockManagerMaster.
16/06/10 14:21:23 INFO BlockManagerMasterEndpoint: Removing block manager BlockManagerId(5, 172.50.33.159, 58773)
16/06/10 14:21:23 INFO BlockManagerMaster: Removed 5 successfully in removeExecutor
16/06/10 14:21:23 WARN ReliableDeliverySupervisor: Association with remote system [akka.tcp://sparkExecutor@172.50.33.159:58754] has failed, address is now gated for [5000] ms. Reason: [Disassociated] 
16/06/10 14:21:23 INFO ExecutorAllocationManager: Existing executor 5 has been removed (new total is 0)
16/06/10 14:21:23 WARN ExecutorAllocationManager: Attempted to mark unknown executor 5 idle
16/06/10 14:21:24 INFO AppClient$ClientEndpoint: Executor updated: app-20160610142005-0004/5 is now EXITED (Command exited with code 1)
16/06/10 14:21:24 INFO SparkDeploySchedulerBackend: Executor app-20160610142005-0004/5 removed: Command exited with code 1
16/06/10 14:21:24 INFO SparkDeploySchedulerBackend: Asked to remove non-existent executor 5
16/06/10 14:21:24 INFO AppClient$ClientEndpoint: Executor added: app-20160610142005-0004/6 on worker-20160610141809-172.50.33.159-57648 (172.50.33.159:57648) with 4 cores
16/06/10 14:21:24 INFO SparkDeploySchedulerBackend: Granted executor ID app-20160610142005-0004/6 on hostPort 172.50.33.159:57648 with 4 cores, 1024.0 MB RAM
16/06/10 14:21:24 INFO AppClient$ClientEndpoint: Executor updated: app-20160610142005-0004/6 is now RUNNING
16/06/10 14:21:24 INFO AppClient$ClientEndpoint: Executor updated: app-20160610142005-0004/6 is now LOADING
16/06/10 14:21:26 INFO SparkDeploySchedulerBackend: Registered executor: AkkaRpcEndpointRef(Actor[akka.tcp://sparkExecutor@172.50.33.159:58853/user/Executor#-1050109075]) with ID 6
16/06/10 14:21:26 INFO TaskSetManager: Starting task 1.3 in stage 0.0 (TID 6, 172.50.33.159, PROCESS_LOCAL, 2146 bytes)
16/06/10 14:21:26 INFO TaskSetManager: Starting task 0.3 in stage 0.0 (TID 7, 172.50.33.159, PROCESS_LOCAL, 2136 bytes)
16/06/10 14:21:26 INFO ExecutorAllocationManager: New executor 6 has registered (new total is 1)
16/06/10 14:21:26 INFO BlockManagerMasterEndpoint: Registering block manager 172.50.33.159:58872 with 530.0 MB RAM, BlockManagerId(6, 172.50.33.159, 58872)
16/06/10 14:21:40 ERROR TaskSchedulerImpl: Lost executor 6 on 172.50.33.159: remote Rpc client disassociated
16/06/10 14:21:40 INFO TaskSetManager: Re-queueing tasks for 6 from TaskSet 0.0
16/06/10 14:21:40 WARN ReliableDeliverySupervisor: Association with remote system [akka.tcp://sparkExecutor@172.50.33.159:58853] has failed, address is now gated for [5000] ms. Reason: [Disassociated] 
16/06/10 14:21:40 WARN TaskSetManager: Lost task 0.3 in stage 0.0 (TID 7, 172.50.33.159): ExecutorLostFailure (executor 6 lost)
16/06/10 14:21:40 ERROR TaskSetManager: Task 0 in stage 0.0 failed 4 times; aborting job
16/06/10 14:21:40 WARN TaskSetManager: Lost task 1.3 in stage 0.0 (TID 6, 172.50.33.159): ExecutorLostFailure (executor 6 lost)
16/06/10 14:21:40 INFO TaskSchedulerImpl: Removed TaskSet 0.0, whose tasks have all completed, from pool 
16/06/10 14:21:40 INFO ExecutorAllocationManager: Existing executor 6 has been removed (new total is 0)
16/06/10 14:21:40 INFO TaskSchedulerImpl: Cancelling stage 0
16/06/10 14:21:40 INFO DAGScheduler: ResultStage 0 (count at Main.java:26) failed in 69.342 s
16/06/10 14:21:40 INFO DAGScheduler: Job 0 failed: count at Main.java:26, took 69.603206 s
16/06/10 14:21:40 WARN ExecutorAllocationManager: No stages are running, but numRunningTasks != 0
16/06/10 14:21:40 INFO DAGScheduler: Executor lost: 6 (epoch 0)
16/06/10 14:21:40 WARN ExecutorAllocationManager: Attempted to mark unknown executor 6 idle
16/06/10 14:21:40 INFO BlockManagerMasterEndpoint: Trying to remove executor 6 from BlockManagerMaster.
16/06/10 14:21:40 INFO BlockManagerMasterEndpoint: Removing block manager BlockManagerId(6, 172.50.33.159, 58872)
16/06/10 14:21:40 INFO BlockManagerMaster: Removed 6 successfully in removeExecutor
16/06/10 14:21:40 INFO AppClient$ClientEndpoint: Executor updated: app-20160610142005-0004/6 is now EXITED (Command exited with code 1)
16/06/10 14:21:40 INFO SparkDeploySchedulerBackend: Executor app-20160610142005-0004/6 removed: Command exited with code 1
16/06/10 14:21:40 INFO SparkDeploySchedulerBackend: Asked to remove non-existent executor 6
16/06/10 14:21:40 INFO AppClient$ClientEndpoint: Executor added: app-20160610142005-0004/7 on worker-20160610141809-172.50.33.159-57648 (172.50.33.159:57648) with 4 cores
16/06/10 14:21:40 INFO SparkDeploySchedulerBackend: Granted executor ID app-20160610142005-0004/7 on hostPort 172.50.33.159:57648 with 4 cores, 1024.0 MB RAM
16/06/10 14:21:40 INFO AppClient$ClientEndpoint: Executor updated: app-20160610142005-0004/7 is now LOADING
16/06/10 14:21:40 INFO AppClient$ClientEndpoint: Executor updated: app-20160610142005-0004/7 is now RUNNING
16/06/10 14:21:43 INFO SparkDeploySchedulerBackend: Registered executor: AkkaRpcEndpointRef(Actor[akka.tcp://sparkExecutor@172.50.33.159:58949/user/Executor#645252204]) with ID 7
16/06/10 14:21:43 INFO ExecutorAllocationManager: New executor 7 has registered (new total is 1)
16/06/10 14:21:43 INFO BlockManagerMasterEndpoint: Registering block manager 172.50.33.159:58968 with 530.0 MB RAM, BlockManagerId(7, 172.50.33.159, 58968)
16/06/10 14:21:48 INFO SparkUI: Stopped Spark web UI at http://172.50.33.159:4040
16/06/10 14:21:48 INFO DAGScheduler: Stopping DAGScheduler
16/06/10 14:21:48 INFO SparkDeploySchedulerBackend: Shutting down all executors
16/06/10 14:21:48 INFO SparkDeploySchedulerBackend: Asking each executor to shut down
16/06/10 14:21:48 WARN ReliableDeliverySupervisor: Association with remote system [akka.tcp://sparkExecutor@172.50.33.159:58949] has failed, address is now gated for [5000] ms. Reason: [Disassociated] 
16/06/10 14:21:48 INFO MapOutputTrackerMasterEndpoint: MapOutputTrackerMasterEndpoint stopped!
16/06/10 14:21:48 INFO MemoryStore: MemoryStore cleared
16/06/10 14:21:48 INFO BlockManager: BlockManager stopped
16/06/10 14:21:48 INFO BlockManagerMaster: BlockManagerMaster stopped
16/06/10 14:21:48 INFO OutputCommitCoordinator$OutputCommitCoordinatorEndpoint: OutputCommitCoordinator stopped!
16/06/10 14:21:48 INFO SparkContext: Successfully stopped SparkContext
16/06/10 14:21:48 INFO RemoteActorRefProvider$RemotingTerminator: Shutting down remote daemon.
16/06/10 14:21:48 INFO RemoteActorRefProvider$RemotingTerminator: Remote daemon shut down; proceeding with flushing remote transports.
16/06/10 14:21:48 INFO RemoteActorRefProvider$RemotingTerminator: Remoting shut down.
Exception in thread "main" org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in stage 0.0 failed 4 times, most recent failure: Lost task 0.3 in stage 0.0 (TID 7, 172.50.33.159): ExecutorLostFailure (executor 6 lost)
Driver stacktrace:
    at org.apache.spark.scheduler.DAGScheduler.org$apache$spark$scheduler$DAGScheduler$$failJobAndIndependentStages(DAGScheduler.scala:1283)
    at org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1271)
    at org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1270)
    at scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59)
    at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:48)
    at org.apache.spark.scheduler.DAGScheduler.abortStage(DAGScheduler.scala:1270)
    at org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:697)
    at org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:697)
    at scala.Option.foreach(Option.scala:257)
    at org.apache.spark.scheduler.DAGScheduler.handleTaskSetFailed(DAGScheduler.scala:697)
    at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.doOnReceive(DAGScheduler.scala:1496)
    at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1458)
    at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1447)
    at org.apache.spark.util.EventLoop$$anon$1.run(EventLoop.scala:48)
    at org.apache.spark.scheduler.DAGScheduler.runJob(DAGScheduler.scala:567)
    at org.apache.spark.SparkContext.runJob(SparkContext.scala:1824)
    at org.apache.spark.SparkContext.runJob(SparkContext.scala:1837)
    at org.apache.spark.SparkContext.runJob(SparkContext.scala:1850)
    at org.apache.spark.SparkContext.runJob(SparkContext.scala:1921)
    at org.apache.spark.rdd.RDD.count(RDD.scala:1125)
    at org.apache.spark.api.java.JavaRDDLike$class.count(JavaRDDLike.scala:445)
    at org.apache.spark.api.java.AbstractJavaRDDLike.count(JavaRDDLike.scala:47)
    at com.cleartrail.clearinsight.spark.Main.main(Main.java:26)
16/06/10 14:21:49 INFO ShutdownHookManager: Shutdown hook called
16/06/10 14:21:49 INFO ShutdownHookManager: Deleting directory C:\Users\pranjal.jaju\AppData\Local\Temp\spark-bcb3998e-0cd5-4146-9a01-ec237325fc5f

最佳答案

有一个物理洗牌服务需要启动，可能是你没有在 spark 中启动该服务

关于java - 独立集群中的 Spark 动态分配使我的应用程序失败，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/37744297/

java - 独立集群中的 Spark 动态分配使我的应用程序失败

上一篇：java - 向 guice servlet 添加过滤器

下一篇：java - 如何制作具有多种类型节点的树并且每个节点可以在java中有多个子节点