scala - 如何在 Spark Notebook 中导入库

我在导入时遇到问题 magellan-1.0.4-s_2.11在 Spark 笔记本中。我已经从 https://spark-packages.org/package/harsha2010/magellan 下载了 jar并尝试放置 SPARK_HOME/bin/spark-shell --packages harsha2010:magellan:1.0.4-s_2.11在 Start of Customized Settings bin 文件夹的 spark-notebook 文件的一部分。

这是我的进口

import magellan.{Point, Polygon, PolyLine}
import magellan.coord.NAD83
import org.apache.spark.sql.magellan.MagellanContext
import org.apache.spark.sql.magellan.dsl.expressions._
import org.apache.spark.sql.Row
import org.apache.spark.sql.types._

而我的错误...

<console>:71: error: object Point is not a member of package org.apache.spark.sql.magellan
       import magellan.{Point, Polygon, PolyLine}
              ^
<console>:72: error: object coord is not a member of package org.apache.spark.sql.magellan
       import magellan.coord.NAD83
                       ^
<console>:73: error: object MagellanContext is not a member of package org.apache.spark.sql.magellan
       import org.apache.spark.sql.magellan.MagellanContext

然后，我尝试将新库像任何其他库一样导入 main script。像这样:

$lib_dir/magellan-1.0.4-s_2.11.jar"

这没有用，我正在挠头想知道我做错了什么。如何将 magellan 等库导入 spark notebook？

最佳答案

尝试评估类似的东西

:dp "harsha2010" % "magellan" % "1.0.4-s_2.11"

它将库加载到 Spark，允许它是 import ed - 假设它可以通过 Maven 仓库获得。在我的情况下，它失败并显示一条消息:

failed to load 'harsha2010:magellan:jar:1.0.4-s_2.11 (runtime)' from ["Maven2 local (file:/home/dev/.m2/repository/, releases+snapshots) without authentication", "maven-central (http://repo1.maven.org/maven2/, releases+snapshots) without authentication", "spark-packages (http://dl.bintray.com/spark-packages/maven/, releases+snapshots) without authentication", "oss-sonatype (https://oss.sonatype.org/content/repositories/releases/, releases+snapshots) without authentication"] into /tmp/spark-notebook/aether/b2c7d8c5-1f56-4460-ad39-24c4e93a9786

我认为文件太大并且在下载整个文件之前连接中断了。

解决方法

所以我从以下位置手动下载了 JAR:

http://dl.bintray.com/spark-packages/maven/harsha2010/magellan/1.0.4-s_2.11/

并将其复制到:

/tmp/spark-notebook/aether/b2c7d8c5-1f56-4460-ad39-24c4e93a9786/harsha2010/magellan/1.0.4-s_2.11

然后:dp命令有效。尝试先调用它，如果它失败，则将 JAR 复制到正确的路径中以使其正常工作。

更好的解决方案

我应该首先调查为什么下载未能修复它……或者将该库放在我本地的 M2 存储库中。但这应该能让你继续前进。

关于scala - 如何在 Spark Notebook 中导入库，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/42686240/

scala - 如何在 Spark Notebook 中导入库

上一篇：symfony - 在奏鸣曲管理员中使用额外字段坚持多对多

下一篇：Symfony2 学说 :schema:update with multiple Entity Managers