我正在尝试通过 Spark Streaming API 使用 Kafka 生成和使用 Avro 消息。但 Avro 抛出对象而不是可序列化异常。我尝试使用 AvroKey 包装器包装数据。尽管如此,它仍然不起作用。
生产者代码:
public static final String schema = "{"
+"\"fields\": ["
+ " { \"name\": \"str1\", \"type\": \"string\" },"
+ " { \"name\": \"str2\", \"type\": \"string\" },"
+ " { \"name\": \"int1\", \"type\": \"int\" }"
+"],"
+"\"name\": \"myrecord\","
+"\"type\": \"record\""
+"}";
public static void startAvroProducer() throws InterruptedException, IOException{
Properties props = new Properties();
props.put(ProducerConfig.BOOTSTRAP_SERVERS_CONFIG,"localhost:9092");
props.put(ProducerConfig.KEY_SERIALIZER_CLASS_CONFIG, StringSerializer.class);
props.put(ProducerConfig.VALUE_SERIALIZER_CLASS_CONFIG, ByteArraySerializer.class);
props.put(ProducerConfig.CLIENT_ID_CONFIG, "Kafka Avro Producer");
Schema.Parser parser = new Schema.Parser();
Schema schema = parser.parse(AvroProducer.schema);
AvroKey<GenericRecord> k = new AvroKey<GenericRecord>();
GenericRecord datum = new GenericData.Record(schema);
datum.put("str1","phani");
datum.put("str2", "kumar");
datum.put("int1", 1);
k.datum(datum);
GenericDatumWriter<GenericRecord> writer = new GenericDatumWriter<GenericRecord>(schema);
ByteArrayOutputStream os = new ByteArrayOutputStream();
Encoder e = EncoderFactory.get().binaryEncoder(os, null);
writer.write(k.datum(), e);
e.flush();
byte[] bytedata = os.toByteArray();
KafkaProducer<String,byte[]> producer = new KafkaProducer<String,byte[]>(props);
ProducerRecord<String,byte[]> producerRec = new ProducerRecord<String, byte[]>("jason", bytedata);
producer.send(producerRec);
producer.close();
}
消费者代码:
private static SparkConf sc = null;
private static JavaSparkContext jsc = null;
private static JavaStreamingContext jssc = null;
public static void startAvroConsumer() throws InterruptedException {
sc = new SparkConf().setAppName("Spark Avro Streaming Consumer")
.setMaster("local[*]");
jsc = new JavaSparkContext(sc);
jssc = new JavaStreamingContext(jsc, new Duration(200));
Schema.Parser parser = new Schema.Parser();
Schema schema = parser.parse(AvroProducer.schema);
Set<String> topics = Collections.singleton("jason");
Map<String, String> kafkaParams = new HashMap<String, String>();
kafkaParams.put("metadata.broker.list", "localhost:9092");
kafkaParams.put("key.deserializer",
"org.apache.kafka.common.serialization.StringDeserializer");
kafkaParams.put("value.deserializer",
"org.apache.kafka.common.serialization.ByteArrayDeserializer");
JavaPairInputDStream<String, byte[]> inputDstream = KafkaUtils
.createDirectStream(jssc, String.class, byte[].class,
StringDecoder.class, DefaultDecoder.class, kafkaParams,
topics);
GenericDatumReader<GenericRecord> reader = new GenericDatumReader<GenericRecord>(
schema);
inputDstream.map(message -> {
ByteArrayInputStream bis = new ByteArrayInputStream(message._2);
Decoder decoder = DecoderFactory.get().binaryDecoder(bis, null);
GenericRecord record = reader.read(null, decoder);
String str1 = getValue(record, "str1", String.class);
String str2 = getValue(record, "str2", String.class);
int int1 = getValue(record, "int1", Integer.class);
return str1 + " " + str2 + " " + int1;
}).print();;
jssc.start();
jssc.awaitTermination();
}
@SuppressWarnings("unchecked")
public static <T> T getValue(GenericRecord genericRecord, String name,
Class<T> clazz) {
Object obj = genericRecord.get(name);
if (obj == null)
return null;
if (obj.getClass() == Utf8.class) {
return (T) obj.toString();
}
if (obj.getClass() == Integer.class) {
return (T) obj;
}
return null;
}
异常(exception):
Caused by: java.io.NotSerializableException: org.apache.avro.generic.GenericDatumReader
Serialization stack:
- object not serializable (class: org.apache.avro.generic.GenericDatumReader, value: org.apache.avro.generic.GenericDatumReader@7da8db47)
- element of array (index: 0)
- array (class [Ljava.lang.Object;, size 1)
- field (class: java.lang.invoke.SerializedLambda, name: capturedArgs, type: class [Ljava.lang.Object;)
- object (class java.lang.invoke.SerializedLambda, SerializedLambda[capturingClass=class com.applications.streaming.consumers.AvroConsumer, functionalInterfaceMethod=org/apache/spark/api/java/function/Function.call:(Ljava/lang/Object;)Ljava/lang/Object;, implementation=invokeStatic com/applications/streaming/consumers/AvroConsumer.lambda$0:(Lorg/apache/avro/generic/GenericDatumReader;Lscala/Tuple2;)Ljava/lang/String;, instantiatedMethodType=(Lscala/Tuple2;)Ljava/lang/String;, numCaptured=1])
- writeReplace data (class: java.lang.invoke.SerializedLambda)
- object (class com.applications.streaming.consumers.AvroConsumer$$Lambda$13/1805404637, com.applications.streaming.consumers.AvroConsumer$$Lambda$13/1805404637@aa31e58)
- field (class: org.apache.spark.api.java.JavaPairRDD$$anonfun$toScalaFunction$1, name: fun$1, type: interface org.apache.spark.api.java.function.Function)
- object (class org.apache.spark.api.java.JavaPairRDD$$anonfun$toScalaFunction$1, <function1>)
at org.apache.spark.serializer.SerializationDebugger$.improveException(SerializationDebugger.scala:40)
at org.apache.spark.serializer.JavaSerializationStream.writeObject(JavaSerializer.scala:46)
at org.apache.spark.serializer.JavaSerializerInstance.serialize(JavaSerializer.scala:100)
at org.apache.spark.util.ClosureCleaner$.ensureSerializable(ClosureCleaner.scala:295)
... 15 more
在阅读各种博客时,我了解到 Avro 对象没有实现可序列化接口(interface)。但是,根据下面的 jira
https://issues.apache.org/jira/browse/AVRO-1502
问题已解决。我仍然遇到此问题。
这个问题有可能解决吗?
最佳答案
您的问题是您正在从 lambda 引用以下对象 功能
GenericDatumReader<GenericRecord> reader = new GenericDatumReader<GenericRecord>(
schema);
GenericDatumReader
不可序列化。你有两个选择。将对象的实例化移动到 map 函数内(不是一个好的选择)或将此对象移动为类的静态成员。这将强制为每个执行器仅创建一个新对象(每个 jvm 1 个)。考虑到您正在使用预编译模式,您可以在静态 block 中轻松创建实例。像这样
static GenericDatumReader<GenericRecord> reader = new GenericDatumReader<GenericRecord>(new Schema.Parser().parse(AvroProducer.schema));
或
static GenericDatumReader<GenericRecord> reader = new GenericDatumReader<GenericRecord>(AvroProducer.$SCHEMA);
关于java - Avro 序列化对象不可序列化问题,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/46967423/