使用 spark-submit 时出现 Hadoop 错误

标签 hadoop apache-spark amazon-ec2 spark-ec2

我正在尝试通过以下方式使用 Amazon ec2 进行 spark-submit:

spark-submit --packages org.apache.hadoop:hadoop-aws:2.7.1 --master spark://amazonaws.com SimpleApp.py

我最终遇到了以下错误。好像是在找hadoop。我的 ec2 集群是使用 spark-ec2 命令创建的。

Ivy Default Cache set to: /home/adas/.ivy2/cache
The jars for the packages stored in: /home/adas/.ivy2/jars
:: loading settings :: url = jar:file:/home/adas/spark/spark-2.1.0-bin-hadoop2.7/jars/ivy-2.4.0.jar!/org/apache/ivy/core/settings/ivysettings.xml
org.apache.hadoop#hadoop-aws added as a dependency
:: resolving dependencies :: org.apache.spark#spark-submit-parent;1.0
    confs: [default]
:: resolution report :: resolve 66439ms :: artifacts dl 0ms
    :: modules in use:
    ---------------------------------------------------------------------
    |                  |            modules            ||   artifacts   |
    |       conf       | number| search|dwnlded|evicted|| number|dwnlded|
    ---------------------------------------------------------------------
    |      default     |   1   |   0   |   0   |   0   ||   0   |   0   |
    ---------------------------------------------------------------------

:: problems summary ::
:::: WARNINGS
        module not found: org.apache.hadoop#hadoop-aws;2.7.1

    ==== local-m2-cache: tried

      file:/home/adas/.m2/repository/org/apache/hadoop/hadoop-aws/2.7.1/hadoop-aws-2.7.1.pom

      -- artifact org.apache.hadoop#hadoop-aws;2.7.1!hadoop-aws.jar:

      file:/home/adas/.m2/repository/org/apache/hadoop/hadoop-aws/2.7.1/hadoop-aws-2.7.1.jar

    ==== local-ivy-cache: tried

      /home/adas/.ivy2/local/org.apache.hadoop/hadoop-aws/2.7.1/ivys/ivy.xml

      -- artifact org.apache.hadoop#hadoop-aws;2.7.1!hadoop-aws.jar:

      /home/adas/.ivy2/local/org.apache.hadoop/hadoop-aws/2.7.1/jars/hadoop-aws.jar

    ==== central: tried

      https://repo1.maven.org/maven2/org/apache/hadoop/hadoop-aws/2.7.1/hadoop-aws-2.7.1.pom

      -- artifact org.apache.hadoop#hadoop-aws;2.7.1!hadoop-aws.jar:

      https://repo1.maven.org/maven2/org/apache/hadoop/hadoop-aws/2.7.1/hadoop-aws-2.7.1.jar

    ==== spark-packages: tried

      http://dl.bintray.com/spark-packages/maven/org/apache/hadoop/hadoop-aws/2.7.1/hadoop-aws-2.7.1.pom

      -- artifact org.apache.hadoop#hadoop-aws;2.7.1!hadoop-aws.jar:

      http://dl.bintray.com/spark-packages/maven/org/apache/hadoop/hadoop-aws/2.7.1/hadoop-aws-2.7.1.jar

        ::::::::::::::::::::::::::::::::::::::::::::::

        ::          UNRESOLVED DEPENDENCIES         ::

        ::::::::::::::::::::::::::::::::::::::::::::::

        :: org.apache.hadoop#hadoop-aws;2.7.1: not found

        ::::::::::::::::::::::::::::::::::::::::::::::


:::: ERRORS
    Server access error at url https://repo1.maven.org/maven2/org/apache/hadoop/hadoop-aws/2.7.1/hadoop-aws-2.7.1.pom (java.net.NoRouteToHostException: No route to host (Host unreachable))

    Server access error at url https://repo1.maven.org/maven2/org/apache/hadoop/hadoop-aws/2.7.1/hadoop-aws-2.7.1.jar (java.net.NoRouteToHostException: No route to host (Host unreachable))

    Server access error at url http://dl.bintray.com/spark-packages/maven/org/apache/hadoop/hadoop-aws/2.7.1/hadoop-aws-2.7.1.pom (java.net.NoRouteToHostException: No route to host (Host unreachable))

    Server access error at url http://dl.bintray.com/spark-packages/maven/org/apache/hadoop/hadoop-aws/2.7.1/hadoop-aws-2.7.1.jar (java.net.NoRouteToHostException: No route to host (Host unreachable))


:: USE VERBOSE OR DEBUG MESSAGE LEVEL FOR MORE DETAILS
Exception in thread "main" java.lang.RuntimeException: [unresolved dependency: org.apache.hadoop#hadoop-aws;2.7.1: not found]
    at org.apache.spark.deploy.SparkSubmitUtils$.resolveMavenCoordinates(SparkSubmit.scala:1078)
    at org.apache.spark.deploy.SparkSubmit$.prepareSubmitEnvironment(SparkSubmit.scala:296)
    at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:160)
    at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:126)
    at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)

最佳答案

您正在使用 --packages org.apache.hadoop:hadoop-aws:2.7.1 选项提交作业,作业正尝试通过从公共(public) maven 存储库下载包来解决依赖关系。但是,此错误表明它无法访问 Maven 存储库。

Server access error at url https://repo1.maven.org/maven2/org/apache/hadoop/hadoop-aws/2.7.1/hadoop-aws-2.7.1.pom (java.net.NoRouteToHostException: No route to host (Host unreachable))

您可能想检查 spark master 是否可以访问互联网。

关于使用 spark-submit 时出现 Hadoop 错误,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/41725301/

相关文章:

ruby-on-rails - ffmpeg 视频分割少于 10 秒会产生黑色剪辑 - ubuntu 18.04 LTS

amazon-web-services - Ubuntu 16.04 上的 awslogs 服务和 CloudWatch Logs 代理问题

sql - Hadoop/Hive-将单行拆分为多行并存储到新表中

amazon-web-services - 如何让 Zeppelin 在 EMR 集群上干净地重启?

java - Hadoop Map任务:读取指定输入文件的内容

hadoop - 在 Hadoop 中编写自定义分区程序的语法

scala - Apache Zeppelin 无法反序列化数据集 : "NoSuchMethodError"

ubuntu - 在 Windows 上使用 Bitvise Tunnelier SSH 到 EC2

apache-spark - 在 EMR 笔记本 jupyter 中设置 spark.driver.maxResultSize

java - 不断增加 YARN 中 Spark 应用程序的物理内存