java - flink : job won't run with higher taskmanager. 堆.mb

标签 java apache-flink

简单的工作:kafka->flatmap->reduce->map

作业运行正常,默认值为 taskmanager.heap.mb (512Mb)。根据docs :该值应尽可能大。由于相关机器有 96Gb RAM,我将其设置为 75000(任意值)。

启 Action 业时出现此错误:

Caused by: org.apache.flink.runtime.client.JobExecutionException: Job execution failed.   
at org.apache.flink.runtime.jobmanager.JobManager$$anonfun$handleMessage$1$$anonfun$applyOrElse$5.apply$mcV$sp(JobManager.scala:563)   
at org.apache.flink.runtime.jobmanager.JobManager$$anonfun$handleMessage$1$$anonfun$applyOrElse$5.apply(JobManager.scala:509)
at org.apache.flink.runtime.jobmanager.JobManager$$anonfun$handleMessage$1$$anonfun$applyOrElse$5.apply(JobManager.scala:509)
at scala.concurrent.impl.Future$PromiseCompletingRunnable.liftedTree1$1(Future.scala:24)
at scala.concurrent.impl.Future$PromiseCompletingRunnable.run(Future.scala:24)
at akka.dispatch.TaskInvocation.run(AbstractDispatcher.scala:41)
at akka.dispatch.ForkJoinExecutorConfigurator$AkkaForkJoinTask.exec(AbstractDispatcher.scala:401)
at scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260)
at scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339)
at scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979)
at scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107)

Caused by: org.apache.flink.runtime.jobmanager.scheduler.NoResourceAvailableException: Not enough free slots available to run the job. You can decrease the operator parallelism or increase the number of slots per TaskManager in the configuration. Task to schedule: < Attempt #0 (Source: Custom Source (1/1)) @ (unassigned) - [SCHEDULED] > with groupID < 95b239d1777b2baf728645df9a1c4232 > in sharing group < SlotSharingGroup [772c9ff1cf0b6cb3a361e3352f75fcee, d4f856f13654f424d7c49d0f00f6ecca, 81bb8c4310faefe32f97ebd6baa4c04f, 95b239d1777b2baf728645df9a1c4232] >. Resources available to scheduler: Number of instances=0, total number of slots=0, available slots=0
at org.apache.flink.runtime.jobmanager.scheduler.Scheduler.scheduleTask(Scheduler.java:255)
at org.apache.flink.runtime.jobmanager.scheduler.Scheduler.scheduleImmediately(Scheduler.java:131)
at org.apache.flink.runtime.executiongraph.Execution.scheduleForExecution(Execution.java:298)
at org.apache.flink.runtime.executiongraph.ExecutionVertex.scheduleForExecution(ExecutionVertex.java:458)
at org.apache.flink.runtime.executiongraph.ExecutionJobVertex.scheduleAll(ExecutionJobVertex.java:322)
at org.apache.flink.runtime.executiongraph.ExecutionGraph.scheduleForExecution(ExecutionGraph.java:686)
at org.apache.flink.runtime.jobmanager.JobManager$$anonfun$org$apache$flink$runtime$jobmanager$JobManager$$submitJob$1.apply$mcV$sp(JobManager.scala:982)
at org.apache.flink.runtime.jobmanager.JobManager$$anonfun$org$apache$flink$runtime$jobmanager$JobManager$$submitJob$1.apply(JobManager.scala:962)
at org.apache.flink.runtime.jobmanager.JobManager$$anonfun$org$apache$flink$runtime$jobmanager$JobManager$$submitJob$1.apply(JobManager.scala:962)
... 8 more

恢复该参数的默认值(512),作业运行正常。在 5000 时有效 -> 在 10000 时无效。

我错过了什么?


编辑:这比我想象的更加偶然。将值设置为 50000 并重新提交即可成功。在每次测试中,集群都会停止并重新启动。

最佳答案

您可能遇到的情况是在 worker 在主站注册之前提交作业。

5GB JVM 堆初始化速度很快,TaskManager 几乎可以立即注册。对于 70GB 堆,JVM 需要一段时间来初始化和启动。因此,worker 注册较晚,当您提交作业时,由于缺少worker,作业无法执行。

这也是您重新提交作业后它会起作用的原因。

如果您以“流”模式启动集群(通过 start-cluster-streaming.sh 独立启动),JVM 的初始化速度会更快,因为至少 Flink 的内部内存会被延迟初始化。

关于java - flink : job won't run with higher taskmanager. 堆.mb,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/33601020/

相关文章:

java - 不可变/多态 POJO <-> 使用 Jackson 进行 JSON 序列化

java - java中.toObject()方法的工作原理是什么?如何使用它将documentsnapshot转换为POJO对象?解释

rabbitmq - 无法在 Flink 1.3.2 中启动 RabbitMQ 源

apache-flink - flink-zeppelin-没有响应

apache-spark - 使用 Spark 进行流式传输时查询数据库是一种好习惯吗

gradle - 强制flink使用不同的kafka-clients jar

JavaFX - 从子 Controller 到达主 Controller

打开邮箱、读取现有邮件和启动 MessageCountListener 时的 Javamail 同步

JAVA 或我的 IDE 的限制或 SQL 问题?

java - Apache Flink 测试中是否有像 Reactor 和 RxJava 中那样的虚拟时间概念