python - 如何阅读 Spark 处理日志?

标签 python apache-spark hadoop

我已经运行了一个像这样的 python 脚本:

spark-submit \
 --master yarn \
 --deploy-mode client \
 --driver-memory 2G \
 --driver-cores 2 \
 --executor-memory 8G \
 --num-executors 3 \
 --executor-cores 3 \
 script.py

我得到这样的日志:

spark.yarn.driver.memoryOverhead is set but does not apply in client mode.
[Stage 1:=================================================>       (13 + 2) / 15]18/04/13 13:49:18 ERROR YarnScheduler: Lost executor 3 on serverw19.domain: Container killed by YARN for exceeding memory limits. 12.0 GB of 12 GB physical memory used. Consider boosting spark.yarn.executor.memoryOverhead.
[Stage 1:=====================================================>   (14 + 1) / 15]18/04/13 14:01:43 ERROR YarnScheduler: Lost executor 1 on serverw51.domain: Container killed by YARN for exceeding memory limits. 12.0 GB of 12 GB physical memory used. Consider boosting spark.yarn.executor.memoryOverhead.
[Stage 1:====================================================>   (14 + -1) / 15]18/04/13 14:02:48 ERROR YarnScheduler: Lost executor 2 on serverw15.domain: Container killed by YARN for exceeding memory limits. 12.0 GB of 12 GB physical memory used. Consider boosting spark.yarn.executor.memoryOverhead.
[Stage 1:====================================================>   (14 + -8) / 15]18/04/13 14:02:49 ERROR YarnScheduler: Lost an executor 2 (already removed): Pending loss reason.
[Stage 1:=======================================================(26 + -11) / 15]18/04/13 14:29:53 ERROR YarnScheduler: Lost executor 5 on serverw38.domain: Container killed by YARN for exceeding memory limits. 12.0 GB of 12 GB physical memory used. Consider boosting spark.yarn.executor.memoryOverhead.
[Stage 1:=======================================================(28 + -13) / 15]^[18/04/13 14:43:35 ERROR YarnScheduler: Lost executor 6 on serverw10.domain: Slave lost
18/04/13 14:43:35 ERROR TransportChannelHandler: Connection to serverw22.domain/10.252.139.122:54308 has been quiet for 120000 ms while there are outstanding requests. Assuming connection is dead; please adjust spark.network.timeout if this is wrong.
[Stage 1:=======================================================(28 + -15) / 15]18/04/13 14:44:22 ERROR TransportClient: Failed to send RPC 9128980605450004417 to serverw22.domain/10.252.139.122:54308: java.nio.channels.ClosedChannelException
java.nio.channels.ClosedChannelException
18/04/13 14:44:22 ERROR YarnScheduler: Lost executor 4 on serverw36.domain: Slave lost
[Stage 1:=======================================================(31 + -25) / 15]18/04/13 15:05:11 ERROR TransportClient: Failed to send RPC 7766740408770504900 to serverw22.domain/10.252.139.122:54308: java.nio.channels.ClosedChannelException
java.nio.channels.ClosedChannelException
18/04/13 15:05:11 ERROR YarnScheduler: Lost executor 7 on serverw38.domain: Slave lost
[Stage 1:=======================================================(31 + -25) / 15]
  1. backets 中的值是什么意思? (13 + 2)/15 之后 (28 + -13)/15 等等,最后 (31 + -25)/15
  2. 为什么遗失了执行者?
  3. 这个应用程序是否已死,我应该终止它,否则它会成功完成?

问候
帕维尔

最佳答案

What does values in backets mean? (13 + 2)/15 later (28 + -13)/15 etc and finaly (31 + -25) / 15

第一个数字是当前操作已完成的分区数。

第二个数字是当前正在处理的分区数。如果数字为负数,则表示分区结果无效,必须重新计算。

最后,最后一个数字是当前操作的分区总数。

Why executors are lost?

正如日志错误消息所说,任务使用的内存多于执行程序实际分配的内存。

Is this application dead and I should kill it or it will finish succesfuly?

通常 Spark 应该能够完成应用程序(无论它是成功结束还是错误结束)。但是,在这种情况下,无论如何我都不会对它成功完成寄予太大希望——所以如果我是你,我会干掉它并检查内存设置。

关于python - 如何阅读 Spark 处理日志?,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/49818007/

相关文章:

python - 在一行中创建 n 个空字符串

python / Selenium : How to set "value" for input when it is not listed in page source?

python - 是否可以下载文件的特定部分?

database - Apache Spark 的主键

sql - 匹配Hive中STRUCT列中的字符串

java - 类路径问题 - getJNIEnv 失败

java - 在 hadoop 上解析 Stackoverflow 的 posts.xml

python - 生产中未调用错误处理程序

apache-spark - pyspark 的 toDF() 与 createDataFrame() 的奇怪行为

python - 如何将标准输入数据作为输入输入到 spark 脚本