apache-spark - 如何在Spark Shell中使用TwitterUtils？

我正在尝试使用Spark Shell中的twitterUtils(默认情况下不可用)。

我在spark-env.sh中添加了以下内容:

SPARK_CLASSPATH="/disk.b/spark-master-2014-07-28/external/twitter/target/spark-streaming-twitter_2.10-1.1.0-SNAPSHOT.jar"

我现在可以执行

import org.apache.spark.streaming.twitter._
import org.apache.spark.streaming.StreamingContext._

没有在shell中出现错误，而没有将jar添加到类路径中是不可能的(“错误:对象twitter不是org.apache.spark.streaming包的成员”)。
但是，在Spark shell中执行此操作时会出现错误:

scala> val ssc = new StreamingContext(sc, Seconds(1))
ssc: org.apache.spark.streaming.StreamingContext =
org.apache.spark.streaming.StreamingContext@6e78177b

scala> val tweets = TwitterUtils.createStream(ssc, "twitter.txt")
error: bad symbolic reference. A signature in TwitterUtils.class refers to
term twitter4j in package <root> which is not available.
It may be completely missing from the current classpath, or the version on the classpath might be incompatible with the version used when compiling
TwitterUtils.class.

我想念什么？我必须进口另一个 jar 吗？

最佳答案

是的，除了已经拥有的spark-streaming-twitter之外，您还需要Twitter4J JAR。具体来说，the Spark devs suggest using Twitter4J version 3.0.3。

下载正确的JAR之后，您将需要通过--jars标志将它们传递给Shell。我认为您也可以通过SPARK_CLASSPATH完成此操作。

这是我在Spark EC2集群上执行的操作:

#!/bin/bash
cd /root/spark/lib
mkdir twitter4j

# Get the Spark Streaming JAR.
curl -O "http://search.maven.org/remotecontent?filepath=org/apache/spark/spark-streaming-twitter_2.10/1.0.0/spark-streaming-twitter_2.10-1.0.0.jar"

# Get the Twitter4J JARs. Check out http://twitter4j.org/archive/ for other versions.
TWITTER4J_SOURCE=twitter4j-3.0.3.zip
curl -O "http://twitter4j.org/archive/$TWITTER4J_SOURCE"
unzip -j ./$TWITTER4J_SOURCE "lib/*.jar" -d twitter4j/
rm $TWITTER4J_SOURCE

cd
# Point the shell to these JARs and go!
TWITTER4J_JARS=`ls -m /root/spark/lib/twitter4j/*.jar | tr -d '\n'`
/root/spark/bin/spark-shell --jars /root/spark/lib/spark-streaming-twitter_2.10-1.0.0.jar,$TWITTER4J_JARS

关于apache-spark - 如何在Spark Shell中使用TwitterUtils？，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/25085128/

apache-spark - 如何在Spark Shell中使用TwitterUtils？

上一篇：r - 在 R 中安装 lightgbm

下一篇：oauth-2.0 - 使用 Ionic Framework 开发的移动应用程序应该使用哪个 OAuth 流程？