apache-spark - ContextCleaner : Cleaned accumulator what does it mean in scala spark?

标签 apache-spark

当我运行我的 spark 程序时,我看到了这个输出,然后慢慢地完成,这意味着什么?

19/04/01 15:34:24 INFO ContextCleaner: Cleaned accumulator 82
19/04/01 15:34:24 INFO ContextCleaner: Cleaned shuffle 0
19/04/01 15:34:24 INFO ContextCleaner: Cleaned accumulator 69
19/04/01 15:34:24 INFO ContextCleaner: Cleaned accumulator 30
19/04/01 15:34:24 INFO ContextCleaner: Cleaned accumulator 40
19/04/01 15:34:24 INFO ContextCleaner: Cleaned accumulator 61
19/04/01 15:34:24 INFO ContextCleaner: Cleaned accumulator 41
19/04/01 15:34:24 INFO ContextCleaner: Cleaned accumulator 52
19/04/01 15:34:24 INFO ContextCleaner: Cleaned accumulator 29
19/04/01 15:34:24 INFO ContextCleaner: Cleaned accumulator 31
19/04/01 15:34:24 INFO ContextCleaner: Cleaned accumulator 57
19/04/01 15:34:24 INFO ContextCleaner: Cleaned accumulator 60
19/04/01 15:34:24 INFO ContextCleaner: Cleaned accumulator 87
19/04/01 15:34:24 INFO ContextCleaner: Cleaned accumulator 79
19/04/01 15:34:24 INFO ContextCleaner: Cleaned accumulator 78
19/04/01 15:34:24 INFO ContextCleaner: Cleaned accumulator 84
19/04/01 15:34:24 INFO ContextCleaner: Cleaned accumulator 34
19/04/01 15:34:24 INFO ContextCleaner: Cleaned accumulator 49
19/04/01 15:34:24 INFO ContextCleaner: Cleaned accumulator 75
19/04/01 15:34:24 INFO ContextCleaner: Cleaned accumulator 88
19/04/01 15:34:24 INFO ContextCleaner: Cleaned accumulator 48

我正在使用的 Deps
name := "BigData"

version := "0.1"

scalaVersion := "2.11.12"

libraryDependencies += "com.github.tototoshi" %% "scala-csv" % "1.3.5"
// https://mvnrepository.com/artifact/org.apache.spark/spark-core
libraryDependencies += "org.apache.spark" %% "spark-core" % "2.4.0"

// https://mvnrepository.com/artifact/org.apache.spark/spark-sql
libraryDependencies += "org.apache.spark" %% "spark-sql" % "2.4.0"
// https://mvnrepository.com/artifact/com.microsoft.sqlserver/mssql-jdbc
libraryDependencies += "com.microsoft.sqlserver" % "mssql-jdbc" % "6.1.0.jre8"
libraryDependencies += "com.databricks" % "spark-xml_2.11" % "0.4.1"

// https://mvnrepository.com/artifact/com.typesafe.akka/akka-actor
libraryDependencies += "com.typesafe.akka" %% "akka-actor" % "2.5.19"
// https://mvnrepository.com/artifact/com.typesafe.akka/akka-http
libraryDependencies += "com.typesafe.akka" %% "akka-http" % "10.1.5"
// https://mvnrepository.com/artifact/com.typesafe.akka/akka-stream
libraryDependencies += "com.typesafe.akka" %% "akka-stream" % "2.5.19"

// https://mvnrepository.com/artifact/org.apache.livy/livy-core
libraryDependencies += "org.apache.livy" %% "livy-core" % "0.5.0-incubating"

dependencyOverrides += "com.fasterxml.jackson.core" % "jackson-core" % "2.9.4"
dependencyOverrides += "com.fasterxml.jackson.core" % "jackson-databind" % "2.9.4"
dependencyOverrides += "com.fasterxml.jackson.module" % "jackson-module-scala_2.11" % "2.9.4"
// https://mvnrepository.com/artifact/net.liftweb/lift-json
libraryDependencies += "net.liftweb" %% "lift-json" % "3.2.0"

// https://mvnrepository.com/artifact/org.json4s/json4s-jackson
libraryDependencies += "org.json4s" %% "json4s-jackson" % "3.6.5"

// https://mvnrepository.com/artifact/org.json4s/json4s-native
libraryDependencies += "org.json4s" %% "json4s-native" % "3.6.5"

// https://mvnrepository.com/artifact/oracle/xdb

//libraryDependencies += "oracle" % "xdb" % "1.0"

最佳答案

您可以使用以下属性禁用 ContextCleaner

spark.cleaner.referenceTracking false
spark.cleaner.referenceTracking.blocking false
spark.cleaner.referenceTracking.blocking.shuffle false
spark.cleaner.referenceTracking.cleanCheckpoints false 

但是,如果您在 2.1 上运行,则无需显式设置这些属性

你可以从这里获得更多信息
~~~ https://jaceklaskowski.gitbooks.io/mastering-apache-spark/content/spark-service-contextcleaner.html ~~~

https://books.japila.pl/apache-spark-internals/apache-spark-internals/core/ContextCleaner.html

关于apache-spark - ContextCleaner : Cleaned accumulator what does it mean in scala spark?,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/55452892/

相关文章:

apache-spark - MapWithState 给出 java.lang.ClassCastException : org. apache.spark.util.SerializableConfiguration cannot be cast while recovering from checkpoint

python - Hive 和 Spark 窗口函数的数据洗牌

apache-spark - 为什么 sortBy 转换会触发 Spark 作业?

python - 使用 pika 在 python 中使用 SparkStreaming、RabbitMQ 和 MQTT

python - PySpark:类型错误:StructType 无法接受类型 <type 'unicode' > 或 <type 'str' > 的对象

apache-spark - Spark sql 日期添加

apache-spark - 我不明白为什么最后阶段没有任何保存或追加数据操作

scala - Spark 2.0 中访问向量列时出现 MatchError

apache-spark - 如何在 spark Dataframe 中像 SQL 一样实现 EXISTS 条件

python - Spark 中的协同过滤