apache-spark - 未找到 JDBC 驱动程序 - 从 Spark 提交到 YARN

标签 apache-spark hadoop-yarn apache-spark-sql

尝试从数据库表中读取所有行并将其写入另一个空目标表。所以当我在主节点发出以下命令时,它按预期工作 -

$./bin/spark-submit --class cs.TestJob_publisherstarget --driver-class-path ./lib/mysql-connector-java-5.1.35-bin.jar --jars ./lib/mysql-connector-java-5.1.35-bin.jar,./lib/univocity-parsers-1.5.6.jar,./lib/commons-csv-1.1.1-SNAPSHOT.jar ./lib/uber-ski-spark-job-0.0.1-SNAPSHOT.jar

(其中:uber-ski-spark-job-0.0.1-SNAPSHOT.jar 是../spark/lib 文件夹中的打包jar,cs.TestJob_publisherstarget 是类)

上述命令非常适用于代码并从 MySQL 中的表读取所有行并将所有 roes 转储到目标表,使用 --jars 选项提到的 JDBC 驱动程序。

这里是问题:

一切都和上面一样,当我向 YARN 提交相同的作业时,它失败并显示异常指示 - 找不到驱动程序

$./bin/spark-submit --verbose --class cs.TestJob_publisherstarget --master yarn-cluster --driver-class-path ./lib/mysql-connector-java-5.1.35-bin.jar --jars ./lib/mysql-connector-java-5.1.35-bin.jar ./lib/uber-ski-spark-job-0.0.1-SNAPSHOT.jar

YARN 控制台异常:

Error: application failed with exception
org.apache.spark.SparkException: Application finished with failed status
    at org.apache.spark.deploy.yarn.Client.run(Client.scala:625)
    at org.apache.spark.deploy.yarn.Client$.main(Client.scala:650)
    at org.apache.spark.deploy.yarn.Client.main(Client.scala)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
    at java.lang.reflect.Method.invoke(Method.java:606)
    at org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:577)
    at org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:174)
    at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:197)
    at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:112)
    at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)

日志异常:

5/10/12 20:38:59 ERROR yarn.ApplicationMaster: User class threw exception: No suitable driver found for jdbc:mysql://localhost:3306/pubs?user=root&password=root
java.sql.SQLException: No suitable driver found for jdbc:mysql://localhost:3306/pubs?user=root&password=root
    at java.sql.DriverManager.getConnection(DriverManager.java:596)
    at java.sql.DriverManager.getConnection(DriverManager.java:187)
    at org.apache.spark.sql.jdbc.JDBCRDD$.resolveTable(JDBCRDD.scala:96)
    at org.apache.spark.sql.jdbc.JDBCRelation.<init>(JDBCRelation.scala:133)
    at org.apache.spark.sql.jdbc.DefaultSource.createRelation(JDBCRelation.scala:121)
    at org.apache.spark.sql.sources.ResolvedDataSource$.apply(ddl.scala:219)
    at org.apache.spark.sql.SQLContext.load(SQLContext.scala:697)
    at com.cambridgesemantics.application.sdi.compiler.spark.DataSource.getDataFrame(DataSource.scala:20)
    at cs.TestJob_publisherstarget$.main(TestJob_publisherstarget.scala:29)
    at cs.TestJob_publisherstarget.main(TestJob_publisherstarget.scala)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
    at java.lang.reflect.Method.invoke(Method.java:606)
    at org.apache.spark.deploy.yarn.ApplicationMaster$$anon$2.run(ApplicationMaster.scala:484)
15/10/12 20:38:59 INFO yarn.ApplicationMaster: Final app status: FAILED, exitCode: 15, (reason: User class threw exception: No suitable driver found for jdbc:mysql://localhost:3306/pubs?user=root&password=root)

无论如何:我应该将 JDBC 驱动程序 jar 文件放在哪里?我已经将它复制到每个子节点的库中,仍然没有运气!

最佳答案

我遇到了同样的问题,它在本地模式下工作但在 yarn-client 下不工作。

我添加到 spark 提交:

--conf "spark.executor.extraClassPath=/path/to/mysql-connector-java-5.1.34.jar

这对我有用

关于apache-spark - 未找到 JDBC 驱动程序 - 从 Spark 提交到 YARN,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/33089900/

相关文章:

scala - DataFrame 分解 JSON 对象列表

apache - 可以在没有 HDFS 的情况下使用 Apache YARN 吗?

apache-kafka - Spark Streaming 应用程序因 KafkaException : String exceeds the maximum size or with IllegalArgumentException 而失败

hadoop - DataFrame对象未显示任何数据

apache-spark - 为什么Spark Planner偏爱排序合并联接而不是随机哈希联接?

python - Jaccard 文本行之间的相似度 Apache Spark

mongodb - 无法在身份验证模式下使用 Mongo 连接 Mongo-Spark Connector

scala - 如何在 DataFrames 中将列类型从 String 更改为 Date?

游戏框架和 Spark

apache-spark - 使用 Yarn 集群设置 Apache Spark