docker - 无法连接在Docker中运行的Apache Spark

标签 docker apache-spark pyspark

我正在尝试通过主机系统连接在docker中运行的Spark集群。我尝试了python脚本和spark-shell都给出了相同的结果:
在Docker内

park-master_1  | 20/07/24 10:13:26 ERROR TransportRequestHandler: Error while invoking RpcHandler#receive() for one-way message.
spark-master_1  | java.io.InvalidClassException: org.apache.spark.deploy.ApplicationDescription; local class incompatible: stream classdesc serialVersionUID = 1574364215946805297, local class serialVersionUID = 6543101073799644159
spark-master_1  |   at java.io.ObjectStreamClass.initNonProxy(ObjectStreamClass.java:699)
spark-master_1  |   at java.io.ObjectInputStream.readNonProxyDesc(ObjectInputStream.java:1885)
spark-master_1  |   at java.io.ObjectInputStream.readClassDesc(ObjectInputStream.java:1751)
spark-master_1  |   at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:2042)
spark-master_1  |   at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1573)
spark-master_1  |   at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2287)
spark-master_1  |   at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:2211)
spark-master_1  |   at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:2069)
spark-master_1  |   at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1573)
spark-master_1  |   at java.io.ObjectInputStream.readObject(ObjectInputStream.java:431)
spark-master_1  |   at org.apache.spark.serializer.JavaDeserializationStream.readObject(JavaSerializer.scala:75)
spark-master_1  |   at org.apache.spark.serializer.JavaSerializerInstance.deserialize(JavaSerializer.scala:108)
spark-master_1  |   at org.apache.spark.rpc.netty.NettyRpcEnv$$anonfun$deserialize$1$$anonfun$apply$1.apply(Nett
从主机系统在命令行上运行spark-shell会出现以下错误:
  docker-spark-cluster git:(master) ✗ spark-shell --master spark://localhost:7077 
20/07/24 15:13:17 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
Using Spark's default log4j profile: org/apache/spark/log4j-defaults.properties
Setting default log level to "WARN".
To adjust logging level use sc.setLogLevel(newLevel). For SparkR, use setLogLevel(newLevel).
20/07/24 15:14:25 ERROR StandaloneSchedulerBackend: Application has been killed. Reason: All masters are unresponsive! Giving up.
20/07/24 15:14:25 WARN StandaloneSchedulerBackend: Application ID is not initialized yet.
20/07/24 15:14:25 WARN StandaloneAppClient$ClientEndpoint: Drop UnregisterApplication(null) because has not yet connected to master
20/07/24 15:14:26 ERROR SparkContext: Error initializing SparkContext.
java.lang.IllegalArgumentException: requirement failed: Can only call getServletHandlers on a running MetricsSystem
    at scala.Predef$.require(Predef.scala:281)
    at org.apache.spark.metrics.MetricsSystem.getServletHandlers(MetricsSystem.scala:92)
    at org.apache.spark.SparkContext.<init>(SparkContext.scala:565)
    at org.apache.spark.SparkContext$.getOrCreate(SparkContext.scala:2555)
    at org.apache.spark.sql.SparkSession$Builder.$anonfun$getOrCreate$1(SparkSession.scala:930)
    at scala.Option.getOrElse(Option.scala:189)
    at org.apache.spark.sql.SparkSession$Builder.getOrCreate(SparkSession.scala:921)
    at org.apache.spark.repl.Main$.createSparkSession(Main.scala:106)
    at $line3.$read$$iw$$iw.<init>(<console>:15)
    at $line3.$read$$iw.<init>(<console>:42)
    at $line3.$read.<init>(<console>:44)
    at $line3.$read$.<init>(<console>:48)
    at $line3.$read$.<clinit>(<console>)
    at $line3.$eval$.$print$lzycompute(<console>:7)
    at $line3.$eval$.$print(<console>:6)
    at $line3.$eval.$print(<console>)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
    at java.lang.reflect.Method.invoke(Method.java:498)
    at scala.tools.nsc.interpreter.IMain$ReadEvalPrint.call(IMain.scala:745)
    at scala.tools.nsc.interpreter.IMain$Request.loadAndRun(IMain.scala:1021)
    at scala.tools.nsc.interpreter.IMain.$anonfun$interpret$1(IMain.scala:574)
    at scala.reflect.internal.util.ScalaClassLoader.asContext(ScalaClassLoader.scala:41)
    at scala.reflect.internal.util.ScalaClassLoader.asContext$(ScalaClassLoader.scala:37)
Docker容器
git:(master) ✗ docker ps
CONTAINER ID        IMAGE                           COMMAND                  CREATED             STATUS              PORTS                                                      NAMES
dfe3d47790ee        spydernaz/spark-worker:latest   "/bin/bash /start-wo…"   42 hours ago        Up 23 minutes       0.0.0.0:32769->8081/tcp                                    docker-spark-cluster_spark-worker_2
c5e36b94efdd        spydernaz/spark-worker:latest   "/bin/bash /start-wo…"   42 hours ago        Up 23 minutes       0.0.0.0:32768->8081/tcp                                    docker-spark-cluster_spark-worker_3
60f3d29e9059        spydernaz/spark-worker:latest   "/bin/bash /start-wo…"   42 hours ago        Up 23 minutes       0.0.0.0:32770->8081/tcp                                    docker-spark-cluster_spark-worker_1
d11c67d462fb        spydernaz/spark-master:latest   "/bin/bash /start-ma…"   42 hours ago        Up 23 minutes       6066/tcp, 0.0.0.0:7077->7077/tcp, 0.0.0.0:9090->8080/tcp   docker-spark-cluster_spark-master_1
➜  docker-spark-cluster git:(master) ✗ 
Spark Shell命令 spark-shell --master spark://localhost:7077

最佳答案

正如@koiralo在评论中已经提到的,这是由于在本地和服务器上运行的pySpark的版本不同。
发生了相同的错误,并且在两个地方的版本都匹配后已修复。

关于docker - 无法连接在Docker中运行的Apache Spark,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/63071575/

相关文章:

python - 在几个不同的值上 SparkreduceByKey

amazon-web-services - Docker在Multicontainer Elastic Beanstalk中撰写env_file

linux - 尝试运行 Docker 镜像时出现 "exec user process caused "没有这样的文件或目录“”错误

docker - Kubernetes 从多个私有(private) Docker 注册表中提取

python - Apache Spark 查询仅针对 "dd/mm/yyyy"格式的 YEAR

scala - 如何在 PySpark 中压缩两个 RDD?

apache-spark - 无法在 pyspark 中推断 CSV 的架构

python - Spark read.csv 错误地解析时间戳

postgresql - docker stop && docker rm 并没有真正摆脱我的容器

python - 如何解决 pyspark 中的 pickle 错误?