我将 Spark 2.0 与 PySpark 一起使用。
我正在重新定义 SparkSession
参数通过 GetOrCreate
2.0中引入的方法:
This method first checks whether there is a valid global default SparkSession, and if yes, return that one. If no valid global default SparkSession exists, the method creates a new SparkSession and assigns the newly created SparkSession as the global default.
In case an existing SparkSession is returned, the config options specified in this builder will be applied to the existing SparkSession.
https://spark.apache.org/docs/2.0.1/api/python/pyspark.sql.html#pyspark.sql.SparkSession.Builder.getOrCreate
到现在为止还挺好:
from pyspark import SparkConf
SparkConf().toDebugString()
'spark.app.name=pyspark-shell\nspark.master=local[2]\nspark.submit.deployMode=client'
spark.conf.get("spark.app.name")
'pyspark-shell'
然后我重新定义SparkSession
配置 promise 以查看 WebUI 中的更改appName(name)
Sets a name for the application, which will be shown in the Spark web UI.
https://spark.apache.org/docs/2.0.1/api/python/pyspark.sql.html#pyspark.sql.SparkSession.Builder.appName
c = SparkConf()
(c
.setAppName("MyApp")
.setMaster("local")
.set("spark.driver.memory","1g")
)
from pyspark.sql import SparkSession
(SparkSession
.builder
.enableHiveSupport() # metastore, serdes, Hive udf
.config(conf=c)
.getOrCreate())
spark.conf.get("spark.app.name")
'MyApp'
现在,当我转到 localhost:4040
, 我希望看到 MyApp
作为应用名称。但是,我仍然看到
pyspark-shell application UI
我哪里错了?提前致谢!
最佳答案
我相信这里的文档有点误导,当您使用 Scala 时,您实际上会看到如下警告:
... WARN SparkSession$Builder: Use an existing SparkSession, some configuration may not take effect.
在 Spark 2.0 之前,上下文之间的分离更加明显:
SparkContext
无法在运行时修改配置。您必须先停止现有的上下文。 SQLContext
可以在运行时修改配置。 spark.app.name
与许多其他选项一样,绑定(bind)到 SparkContext
, 并且不能在不停止上下文的情况下进行修改。重用现有的
SparkContext
/SparkSession
import org.apache.spark.SparkConf
import org.apache.spark.sql.SparkSession
spark.conf.get("spark.sql.shuffle.partitions")
String = 200
val conf = new SparkConf()
.setAppName("foo")
.set("spark.sql.shuffle.partitions", "2001")
val spark = SparkSession.builder.config(conf).getOrCreate()
... WARN SparkSession$Builder: Use an existing SparkSession ...
spark: org.apache.spark.sql.SparkSession = ...
spark.conf.get("spark.sql.shuffle.partitions")
String = 2001
而
spark.app.name
配置更新:spark.conf.get("spark.app.name")
String = foo
不影响
SparkContext
:spark.sparkContext.appName
String = Spark shell
停止现有
SparkContext
/SparkSession
现在让我们停止 session 并重复该过程:
spark.stop
val spark = SparkSession.builder.config(conf).getOrCreate()
... WARN SparkContext: Use an existing SparkContext ...
spark: org.apache.spark.sql.SparkSession = ...
spark.sparkContext.appName
String = foo
有趣的是,当我们停止 session 时,我们仍然会收到关于使用现有
SparkContext
的警告。 ,但您可以检查它实际上已停止。
关于apache-spark - Spark 2.0 : Redefining SparkSession params through GetOrCreate and NOT seeing changes in WebUI,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/40701518/