apache-spark - 主类中的Spark-submit master url和SparkSession master url有什么区别？

当使用spark-submit提交作业时，我设置了master URL并给了他一个主类，例如:

spark-submit --class WordCount --master Spark://spark:7077 my.jar

但是在这个主类中，我的 Spark 上下文定义了另一个主 url :

SparkSession.builder().appName("Word2vec").master("local").

这让我很困惑，如果我使用 spark-submit 向独立集群的主节点 (spark://spark:7077) 发送一个作业，会发生什么？使用 local master 启动 SparkSession ？

在集群上执行时，SparkSession master url 是否应该始终与 spark-submit url 相同？

最佳答案

这些属性之间没有区别。如果两者都设置，则直接在应用程序中设置的属性优先。引用documentation :

Any values specified as flags or in the properties file will be passed on to the application and merged with those specified through SparkConf. Properties set directly on the SparkConf take highest precedence, then flags passed to spark-submit or spark-shell, then options in the spark-defaults.conf file. A few configuration keys have been renamed since earlier versions of Spark; in such cases, the older key names are still accepted, but take lower precedence than any instance of the newer key.

关于apache-spark - 主类中的Spark-submit master url和SparkSession master url有什么区别？，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/38826670/

上一篇：ruby-on-rails - 如何从 Rails 时间类获取 2 位数的小时和分钟

下一篇：ruby-on-rails - Rails SaaS 应用程序的最佳服务器堆栈/配置是什么

sql - 使用 Spark SQL 将一列拆分为多列

scala - 如何在 Spark 中定义一个由所有工作人员共享的全局 scala 变量？

java - 使用 group by 根据数据的两个不同子集获取两个聚合的不同计数

scala - Spark GraphX : how to insert just a node to a graph

apache-spark - 如何强制 CSV 的 inferSchema 将整数视为日期(使用 "dateFormat"选项)？

apache-spark - 读取时 Spark 中排序文件的 parquet 摘要文件(_metadata)被忽略？

python - Pyspark - 将 mmddyy 转换为 YYYY-MM-DD

scala - 我的 Scala 应用程序使用哪种设计模式？

java - 使用 Spark SQL 时找不到获取 Spark Logging 类