java - Avro 序列化对象不可序列化问题

标签 java serialization apache-kafka spark-streaming avro

我正在尝试通过 Spark Streaming API 使用 Kafka 生成和使用 Avro 消息。但 Avro 抛出对象而不是可序列化异常。我尝试使用 AvroKey 包装器包装数据。尽管如此,它仍然不起作用。

生产者代码:

public static final String schema = "{"
    +"\"fields\": ["
    +   " { \"name\": \"str1\", \"type\": \"string\" },"
    +   " { \"name\": \"str2\", \"type\": \"string\" },"
    +   " { \"name\": \"int1\", \"type\": \"int\" }"
    +"],"
    +"\"name\": \"myrecord\","
    +"\"type\": \"record\""
    +"}"; 

    public static void startAvroProducer() throws InterruptedException, IOException{
        Properties props = new Properties();
        props.put(ProducerConfig.BOOTSTRAP_SERVERS_CONFIG,"localhost:9092");
        props.put(ProducerConfig.KEY_SERIALIZER_CLASS_CONFIG, StringSerializer.class);
        props.put(ProducerConfig.VALUE_SERIALIZER_CLASS_CONFIG, ByteArraySerializer.class);
        props.put(ProducerConfig.CLIENT_ID_CONFIG, "Kafka Avro Producer");

        Schema.Parser parser = new Schema.Parser();
        Schema schema = parser.parse(AvroProducer.schema);

        AvroKey<GenericRecord> k = new AvroKey<GenericRecord>();

        GenericRecord datum = new GenericData.Record(schema);
        datum.put("str1","phani");
        datum.put("str2", "kumar");
        datum.put("int1", 1);
        k.datum(datum);
        GenericDatumWriter<GenericRecord> writer = new GenericDatumWriter<GenericRecord>(schema);
        ByteArrayOutputStream os = new ByteArrayOutputStream();
        Encoder e = EncoderFactory.get().binaryEncoder(os, null);
        writer.write(k.datum(), e);
        e.flush();
        byte[] bytedata = os.toByteArray();
        KafkaProducer<String,byte[]> producer = new KafkaProducer<String,byte[]>(props);
        ProducerRecord<String,byte[]> producerRec = new ProducerRecord<String, byte[]>("jason", bytedata);
        producer.send(producerRec);
        producer.close();
    }

消费者代码:

private static SparkConf sc = null;
private static JavaSparkContext jsc = null;
private static JavaStreamingContext jssc = null;

public static void startAvroConsumer() throws InterruptedException {
    sc = new SparkConf().setAppName("Spark Avro Streaming Consumer")
            .setMaster("local[*]");

    jsc = new JavaSparkContext(sc);
    jssc = new JavaStreamingContext(jsc, new Duration(200));

    Schema.Parser parser = new Schema.Parser();
    Schema schema = parser.parse(AvroProducer.schema);

    Set<String> topics = Collections.singleton("jason");
    Map<String, String> kafkaParams = new HashMap<String, String>();
    kafkaParams.put("metadata.broker.list", "localhost:9092");
    kafkaParams.put("key.deserializer",
            "org.apache.kafka.common.serialization.StringDeserializer");
    kafkaParams.put("value.deserializer",
            "org.apache.kafka.common.serialization.ByteArrayDeserializer");

    JavaPairInputDStream<String, byte[]> inputDstream = KafkaUtils
            .createDirectStream(jssc, String.class, byte[].class,
                    StringDecoder.class, DefaultDecoder.class, kafkaParams,
                    topics);

    GenericDatumReader<GenericRecord> reader = new GenericDatumReader<GenericRecord>(
            schema);

    inputDstream.map(message -> {
        ByteArrayInputStream bis = new ByteArrayInputStream(message._2);
        Decoder decoder = DecoderFactory.get().binaryDecoder(bis, null);
        GenericRecord record = reader.read(null, decoder);
        String str1 = getValue(record, "str1", String.class);
        String str2 = getValue(record, "str2", String.class);
        int int1 = getValue(record, "int1", Integer.class);

        return str1 + " " + str2 + " " + int1;
    }).print();;

    jssc.start();
    jssc.awaitTermination();
}

@SuppressWarnings("unchecked")
public static <T> T getValue(GenericRecord genericRecord, String name,
        Class<T> clazz) {
    Object obj = genericRecord.get(name);
    if (obj == null)
        return null;
    if (obj.getClass() == Utf8.class) {
        return (T) obj.toString();
    }
    if (obj.getClass() == Integer.class) {
        return (T) obj;
    }
    return null;
}

异常(exception):

Caused by: java.io.NotSerializableException: org.apache.avro.generic.GenericDatumReader
Serialization stack:
    - object not serializable (class: org.apache.avro.generic.GenericDatumReader, value: org.apache.avro.generic.GenericDatumReader@7da8db47)
    - element of array (index: 0)
    - array (class [Ljava.lang.Object;, size 1)
    - field (class: java.lang.invoke.SerializedLambda, name: capturedArgs, type: class [Ljava.lang.Object;)
    - object (class java.lang.invoke.SerializedLambda, SerializedLambda[capturingClass=class com.applications.streaming.consumers.AvroConsumer, functionalInterfaceMethod=org/apache/spark/api/java/function/Function.call:(Ljava/lang/Object;)Ljava/lang/Object;, implementation=invokeStatic com/applications/streaming/consumers/AvroConsumer.lambda$0:(Lorg/apache/avro/generic/GenericDatumReader;Lscala/Tuple2;)Ljava/lang/String;, instantiatedMethodType=(Lscala/Tuple2;)Ljava/lang/String;, numCaptured=1])
    - writeReplace data (class: java.lang.invoke.SerializedLambda)
    - object (class com.applications.streaming.consumers.AvroConsumer$$Lambda$13/1805404637, com.applications.streaming.consumers.AvroConsumer$$Lambda$13/1805404637@aa31e58)
    - field (class: org.apache.spark.api.java.JavaPairRDD$$anonfun$toScalaFunction$1, name: fun$1, type: interface org.apache.spark.api.java.function.Function)
    - object (class org.apache.spark.api.java.JavaPairRDD$$anonfun$toScalaFunction$1, <function1>)
    at org.apache.spark.serializer.SerializationDebugger$.improveException(SerializationDebugger.scala:40)
    at org.apache.spark.serializer.JavaSerializationStream.writeObject(JavaSerializer.scala:46)
    at org.apache.spark.serializer.JavaSerializerInstance.serialize(JavaSerializer.scala:100)
    at org.apache.spark.util.ClosureCleaner$.ensureSerializable(ClosureCleaner.scala:295)
    ... 15 more

在阅读各种博客时,我了解到 Avro 对象没有实现可序列化接口(interface)。但是,根据下面的 jira

https://issues.apache.org/jira/browse/AVRO-1502

问题已解决。我仍然遇到此问题。

这个问题有可能解决吗?

最佳答案

您的问题是您正在从 lambda 引用以下对象 功能

GenericDatumReader<GenericRecord> reader = new GenericDatumReader<GenericRecord>(
        schema);

GenericDatumReader 不可序列化。你有两个选择。将对象的实例化移动到 map 函数内(不是一个好的选择)或将此对象移动为类的静态成员。这将强制为每个执行器仅创建一个新对象(每个 jvm 1 个)。考虑到您正在使用预编译模式,您可以在静态 block 中轻松创建实例。像这样

static GenericDatumReader<GenericRecord> reader = new GenericDatumReader<GenericRecord>(new Schema.Parser().parse(AvroProducer.schema));

static GenericDatumReader<GenericRecord> reader = new GenericDatumReader<GenericRecord>(AvroProducer.$SCHEMA);

关于java - Avro 序列化对象不可序列化问题,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/46967423/

相关文章:

c++ - 在 C++ 中将序列化的 Thrift 结构序列化到 Kafka

apache-kafka - Kafka 代理自动扩展

scala - 如何改善reactive-kafka(Scala和Akka Streams)的缓慢性能?

java - 嵌入式 Tomcat 8 无法启动

java - 每个 http 请求都会调用 WildFly 登录模块

java - 加密 AES/CBC/NoPadding 未产生正确的结果

c# - 序列化和反序列化时如何正确处理流处置和定位

java - 触发 Activiti 工作流程以执行 ReceiveTask 循环中的任务

mysql - Wordpress 使用自定义查询搜索序列化元数据

java - 读/写期间的 XmlBeans 源定位器