java - 运行简单的 twitter 情绪分析代码时获取不存在的 jar 和 java.lang.ClassNotFoundException

标签 java scala twitter apache-spark

我已经为此苦苦挣扎了一段时间。我一直在尝试运行一个简单的 Twitter 情绪分析代码,该代码之前似乎工作正常但不再工作了。我正在使用 spark 1.3.1 和 scala 2.10.4。我在某处读到 TwitterUtils 不适用于 spark 1.0+,所以我尝试了一种解决方法。根据书籍,一切似乎都已到位.. scala 的正确目录结构,使用 sbt 程序集的胖 jar,正确的路径但不知何故 spark 无法获取 jar 文件,我也得到了一个 ClassNotFoundException。

可能出了什么问题,我该如何解决?

编辑:

命令行

../bin/spark-submit --class Sentimenter --master local[4] /home/ubuntu/spark/spark_examples/target/scala-2.10/twitter-sentiment-assembly-1.0.jar

错误:

Warning: Local jar /home/ubuntu/spark/spark_examples/target/scala-2.10/twitter-sentiment-assembly-1.0.jar does not exist, skipping.
java.lang.ClassNotFoundException: Sentimenter
        at java.net.URLClassLoader$1.run(URLClassLoader.java:366)
        at java.net.URLClassLoader$1.run(URLClassLoader.java:355)
        at java.security.AccessController.doPrivileged(Native Method)
        at java.net.URLClassLoader.findClass(URLClassLoader.java:354)
        at java.lang.ClassLoader.loadClass(ClassLoader.java:425)
        at java.lang.ClassLoader.loadClass(ClassLoader.java:358)
        at java.lang.Class.forName0(Native Method)
        at java.lang.Class.forName(Class.java:274)
        at org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:538)
        at org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:166)
        at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:189)
        at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:110)
        at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)

build.sbt 文件:

lazy val root = (project in file(".")).
  settings(
    name := "twitter-sentiment",
    version := "1.0",
    scalaVersion := "2.10.4",
    mainClass in Compile := Some("Sentimenter")        
  )

libraryDependencies ++= Seq(
  "org.apache.spark" %% "spark-core" % "1.3.1" % "provided",
  "org.apache.spark" %% "spark-streaming" % "1.3.1" % "provided",
  "org.apache.spark" % "spark-streaming-twitter_2.11" % "1.3.1"
)

// META-INF discarding
val meta = """META.INF(.)*""".r
assemblyMergeStrategy in assembly := {
  case PathList("javax", "servlet", xs @ _*) => MergeStrategy.first
  case PathList(ps @ _*) if ps.last endsWith ".html" => MergeStrategy.first
  case n if n.startsWith("reference.conf") => MergeStrategy.concat
  case n if n.endsWith(".conf") => MergeStrategy.concat
  case meta(_) => MergeStrategy.discard
  case x => MergeStrategy.first`

这是我从另一个关于 Twitter 情绪的论坛获得的代码

import org.apache.spark.streaming.StreamingContext
import org.apache.spark.streaming.StreamingContext._
import org.apache.spark.SparkContext
import org.apache.spark.streaming.twitter._
import org.apache.spark.SparkConf

object Sentimenter {
  def main(args: Array[String]) {
    System.setProperty("twitter4j.oauth.consumerKey","xxxxxxxxxxxxx");
    System.setProperty("twitter4j.oauth.consumerSecret","xxxxxxxxxxxx");
    System.setProperty("twitter4j.oauth.accessToken","xxxxxxxxxxxx");
    System.setProperty("twitter4j.oauth.accessTokenSecret","xxxxxxxxxx");

    val filters = new Array[String](2)
    filters(0) = "Big Data"
    filters(1) = "geofencing"
    val sparkConf = new SparkConf().setAppName("TweetSentiment").setMaster("local[4]").set("spark.driver.allowMultipleContexts", "true")
    val sc = new SparkContext(sparkConf)
    // get the list of positive words
    val pos_list =  sc.textFile("/home/ubuntu/spark/src/main/scala/Positive_Words.txt") //Random
      .filter(line => !line.isEmpty())
      .collect()
      .toSet
    // get the list of negative words
    val neg_list =  sc.textFile("/home/ubuntu/spark/src/main/scala/Negative_Words.txt") //Random
      .filter(line => !line.isEmpty())
      .collect()
      .toSet
    // create twitter stream
    val ssc = new StreamingContext(sparkConf, Seconds(5))
    val stream = TwitterUtils.createStream(ssc, None, filters)
    val tweets = stream.map(r => r.getText)
    tweets.print() // print tweet text
    ssc.start()
    ssc.awaitTermination()
  }
}

最佳答案

我认为如果你写

../bin/spark-submit --class Sentimenter --master local[4] --jars/home/ubuntu/spark/spark_examples/target/scala-2.10/twitter-sentiment-assembly-1.0 .jar

或尝试重新排列 spark-submit 中的标志,例如:

../bin/spark-submit --jars/home/ubuntu/spark/spark_examples/target/scala-2.10/twitter-sentiment-assembly-1.0.jar --master local[4] -- Sentimenter 类

你可以让它工作。

关于java - 运行简单的 twitter 情绪分析代码时获取不存在的 jar 和 java.lang.ClassNotFoundException,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/31581006/

相关文章:

java - Twitter 抄写员和 clojure "The access_token method must be called with a request_token"

java - jar 没有生成正确的 list 文件

css - 来自 Twitter API 的图像呈现为 1px x 1px

java - 私有(private)锁对象和内在锁

java - 动态代码评估: Unsafe Deserialization Fortify Issue

Java(图形2D): triangle drawn by created Graphics2D not visible until second repaint

java - 按降序对整数数组进行排序并将其与相应的字符串数组相关联

Scala 对象不扩展任何内容

scala - 如何使用 Scala 从 Spark 更新 ORC Hive 表

php - 如何使用 php 从推文中删除除纯文本之外的所有内容?