scala - Spark-提交 java.lang.ClassNotFoundException

标签 scala apache-spark

我正在尝试以独立模式执行spark-submit。我的项目在IntelliJIdea工具中编译成功,我也创建了关联的jar文件,但是当我尝试运行以下命令时:

[cloudera@quickstart bin]$ spark-submit --verbose --class graphx /home/cloudera/ideaProjects/grafoTelefonos/target/graphx-1.0-SNAPSHOT.jar /usr/lib/spark/logs/temp.log

我收到以下输出和错误消息:

Using properties file: /usr/lib/spark/conf/spark-defaults.conf
Adding default property: spark.serializer=org.apache.spark.serializer.KryoSerializer
Adding default property: spark.eventLog.enabled=true
Adding default property: spark.shuffle.service.enabled=true
Adding default property: spark.driver.extraLibraryPath=/usr/lib/hadoop/lib/native
Adding default property: spark.yarn.historyServer.address=http://quickstart.cloudera:18088
Adding default property: spark.dynamicAllocation.schedulerBacklogTimeout=1
Adding default property: spark.yarn.am.extraLibraryPath=/usr/lib/hadoop/lib/native
Adding default property: spark.shuffle.service.port=7337
Adding default property: spark.master=yarn-client
Adding default property: spark.authenticate=false
Adding default property: spark.executor.extraLibraryPath=/usr/lib/hadoop/lib/native
Adding default property: spark.eventLog.dir=hdfs://quickstart.cloudera:8020/user/spark/applicationHistory
Adding default property: spark.dynamicAllocation.enabled=true
Adding default property: spark.dynamicAllocation.minExecutors=0
Adding default property: spark.dynamicAllocation.executorIdleTimeout=60
Adding default property: spark.yarn.jar=local:/usr/lib/spark/lib/spark-assembly.jar
Parsed arguments:
  master                  yarn-client
  deployMode              null
  executorMemory          null
  executorCores           null
  totalExecutorCores      null
  propertiesFile          /usr/lib/spark/conf/spark-defaults.conf
  driverMemory            null
  driverCores             null
  driverExtraClassPath    null
  driverExtraLibraryPath  /usr/lib/hadoop/lib/native
  driverExtraJavaOptions  null
  supervise               false
  queue                   null
  numExecutors            null
  files                   null
  pyFiles                 null
  archives                null
  mainClass               graphx
  primaryResource         file:/home/cloudera/ideaProjects/grafoTelefonos/target/graphx-1.0-SNAPSHOT.jar
  name                    graphx
  childArgs               [/usr/lib/spark/logs/temp.log]
  jars                    null
  packages                null
  packagesExclusions      null
  repositories            null
  verbose                 true

Spark properties used, including those specified through
 --conf and those from the properties file /usr/lib/spark/conf/spark-defaults.conf:
  spark.executor.extraLibraryPath -> /usr/lib/hadoop/lib/native
  spark.yarn.jar -> local:/usr/lib/spark/lib/spark-assembly.jar
  spark.driver.extraLibraryPath -> /usr/lib/hadoop/lib/native
  spark.authenticate -> false
  spark.yarn.historyServer.address -> http://quickstart.cloudera:18088
  spark.yarn.am.extraLibraryPath -> /usr/lib/hadoop/lib/native
  spark.eventLog.enabled -> true
  spark.dynamicAllocation.schedulerBacklogTimeout -> 1
  spark.serializer -> org.apache.spark.serializer.KryoSerializer
  spark.dynamicAllocation.executorIdleTimeout -> 60
  spark.dynamicAllocation.minExecutors -> 0
  spark.shuffle.service.enabled -> true
  spark.shuffle.service.port -> 7337
  spark.eventLog.dir -> hdfs://quickstart.cloudera:8020/user/spark/applicationHistory
  spark.master -> yarn-client
  spark.dynamicAllocation.enabled -> true


Main class:
graphx
Arguments:
/usr/lib/spark/logs/temp.log
System properties:
spark.executor.extraLibraryPath -> /usr/lib/hadoop/lib/native
spark.yarn.jar -> local:/usr/lib/spark/lib/spark-assembly.jar
spark.driver.extraLibraryPath -> /usr/lib/hadoop/lib/native
spark.authenticate -> false
spark.yarn.historyServer.address -> http://quickstart.cloudera:18088
spark.yarn.am.extraLibraryPath -> /usr/lib/hadoop/lib/native
spark.eventLog.enabled -> true
spark.dynamicAllocation.schedulerBacklogTimeout -> 1
SPARK_SUBMIT -> true
spark.serializer -> org.apache.spark.serializer.KryoSerializer
spark.shuffle.service.enabled -> true
spark.dynamicAllocation.minExecutors -> 0
spark.dynamicAllocation.executorIdleTimeout -> 60
spark.app.name -> graphx
spark.jars -> file:/home/cloudera/ideaProjects/grafoTelefonos/target/graphx-1.0-SNAPSHOT.jar
spark.submit.deployMode -> client
spark.shuffle.service.port -> 7337
spark.eventLog.dir -> hdfs://quickstart.cloudera:8020/user/spark/applicationHistory
spark.master -> yarn-client
spark.dynamicAllocation.enabled -> true
Classpath elements:
file:/home/cloudera/ideaProjects/grafoTelefonos/target/graphx-1.0-SNAPSHOT.jar


java.lang.ClassNotFoundException: graphx
    at java.net.URLClassLoader$1.run(URLClassLoader.java:366)
    at java.net.URLClassLoader$1.run(URLClassLoader.java:355)
    at java.security.AccessController.doPrivileged(Native Method)
    at java.net.URLClassLoader.findClass(URLClassLoader.java:354)
    at java.lang.ClassLoader.loadClass(ClassLoader.java:425)
    at java.lang.ClassLoader.loadClass(ClassLoader.java:358)
    at java.lang.Class.forName0(Native Method)
    at java.lang.Class.forName(Class.java:270)
    at org.apache.spark.util.Utils$.classForName(Utils.scala:173)
    at    org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:639)
    at org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:180)
    at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:205)
    at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:120)
    at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)

我的问题是,这个包必须放在哪里?我把它放在 IntelliJIdea 路径下,我应该复制到/usr/lib/spark/中的另一个路径吗?

谢谢!

最佳答案

您必须向 spark-submit 提供完全限定的类名

假设您的包名称为 com.me.application,spark-submit 命令应类似于:

编辑

如评论中所示,您的类名称是 FormatDataTlf,而不是包名称为 tlf 的 graphx, spark-submit --class tlf.FormatDataTlf ....

关于scala - Spark-提交 java.lang.ClassNotFoundException,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/37045061/

相关文章:

scala - 如何找到两个数据帧之间的精确匹配和非精确匹配?

python - Pyspark 爆炸列表创建列表中带有索引的列

java - Apache Spark :-Nullpointer Exception on broadcast variables (YARN Cluster mode)

scala - Kleisli 依赖与 Tagless Final 风格

java - 如何从另一个 scala 文件扩展一个 scala 文件中定义的类

scala - 如何在调度请求中获取响应头和正文?

java - Spark将多行转换为具有多个集合的单行

python - PySpark 应用程序失败,出现 java.lang.OutOfMemoryError : Java heap space

postgresql - 如何配置使用 Postgresql Play JDBC?

scala - 为什么类型级计算需要 Aux 技术?