java - 任务在 Spark 中不可序列化

标签 java scala serialization

我有一个这样的转换:

JavaRDD<Tuple2<String, Long>> mappedRdd = myRDD.values().map(
    new Function<Pageview, Tuple2<String, Long>>() {
      @Override
      public Tuple2<String, Long> call(Pageview pageview) throws Exception {
        String key = pageview.getUrl().toString();
        Long value = getDay(pageview.getTimestamp());
        return new Tuple2<>(key, value);
      }
    });

综合浏览量属于以下类型:Pageview.java

然后我将那个类注册到 Spark 中:

Class[] c = new Class[1];
c[0] = Pageview.class;
sparkConf.registerKryoClasses(c);

Exception in thread "main" org.apache.spark.SparkException: Task not serializable at org.apache.spark.util.ClosureCleaner$.ensureSerializable(ClosureCleaner.scala:166) at org.apache.spark.util.ClosureCleaner$.clean(ClosureCleaner.scala:158) at org.apache.spark.SparkContext.clean(SparkContext.scala:1623) at org.apache.spark.rdd.RDD.map(RDD.scala:286) at org.apache.spark.api.java.JavaRDDLike$class.map(JavaRDDLike.scala:89) at org.apache.spark.api.java.AbstractJavaRDDLike.map(JavaRDDLike.scala:46) at org.apache.gora.tutorial.log.ExampleSpark.run(ExampleSpark.java:100) at org.apache.gora.tutorial.log.ExampleSpark.main(ExampleSpark.java:53) Caused by: java.io.NotSerializableException: org.apache.gora.tutorial.log.ExampleSpark Serialization stack: - object not serializable (class: org.apache.gora.tutorial.log.ExampleSpark, value: org.apache.gora.tutorial.log.ExampleSpark@1a2b4497) - field (class: org.apache.gora.tutorial.log.ExampleSpark$1, name: this$0, type: class org.apache.gora.tutorial.log.ExampleSpark) - object (class org.apache.gora.tutorial.log.ExampleSpark$1, org.apache.gora.tutorial.log.ExampleSpark$1@4ab2775d) - field (class: org.apache.spark.api.java.JavaPairRDD$$anonfun$toScalaFunction$1, name: fun$1, type: interface org.apache.spark.api.java.function.Function) - object (class org.apache.spark.api.java.JavaPairRDD$$anonfun$toScalaFunction$1, ) at org.apache.spark.serializer.SerializationDebugger$.improveException(SerializationDebugger.scala:38) at org.apache.spark.serializer.JavaSerializationStream.writeObject(JavaSerializer.scala:47) at org.apache.spark.serializer.JavaSerializerInstance.serialize(JavaSerializer.scala:80) at org.apache.spark.util.ClosureCleaner$.ensureSerializable(ClosureCleaner.scala:164) ... 7 more

当我调试代码时,我看到 JavaSerializer.scala 被调用,即使有一个名为 KryoSerializer 的类也是如此。

PS 1:我不想使用 Java Serializer,但在 Pageview 上实现 Serializer 并不能解决问题。

PS 2:这并没有解决问题:

...
//String key = pageview.getUrl().toString();
//Long value = getDay(pageview.getTimestamp());
String key = "Dummy";
Long value = 1L;
return new Tuple2<>(key, value);
...

最佳答案

我在 Java 代码中多次遇到这个问题。尽管我使用的是 Java 序列化,但我会将包含该代码的类设为可序列化,或者如果您不想这样做,我会将 Function 设为该类的静态成员。

这是一个解决方案的代码片段。

public class Test {
   private static Function s = new Function<Pageview, Tuple2<String, Long>>() {

     @Override
     public Tuple2<String, Long> call(Pageview pageview) throws Exception {
       String key = pageview.getUrl().toString();
       Long value = getDay(pageview.getTimestamp());
       return new Tuple2<>(key, value);
      }
  };
}

关于java - 任务在 Spark 中不可序列化,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/31105400/

相关文章:

Scala 的 for-comprehension `if` 语句

java - 为什么 sbt 不自动将库依赖项添加到类路径中?

java - Jackson 序列化器在禁用注释时抛出 NullPointerException

java - 直接从 Java 与 Urban Airship 通信

scala - 动态创建 parboiled2 规则

java - HTML.fromHTML - Android 中的 TagHandler

c# - 从 .NET 3.5 更新到 .NET 4.6 后出现 XmlSerializer 错误

c# - 将包含非英文字符的C# POCO序列化为JSON

java - 该文件永远不会在 JavaMail 中发送

java - 如何使用 RestTemplate 传递 header ?