go - KafkaIO连接器/Apache Beam Transform“运行” SDK是否可用?

标签 go apache-kafka apache-beam-io beam

我正在使用Apache Beam“go” SDK构建数据提取管道。

我的管道是使用来自Kafka队列的数据,并将数据持久保存到Google Cloud Bigtable(和/或另一个Kafka主题)。

到目前为止,我还找不到以“go”编写的Kafka IO连接器(也称为Apache I / O转换)(但是,我可以找到Java版本)。

以下是受支持的Apache Beam内置I / O转换的链接:
https://beam.apache.org/documentation/io/built-in/

我正在寻找以下Java代码的“go”等效项:

    pipeline.apply("kafka_deserialization", KafkaIO.<String, String>read()
		.withBootstrapServers(KAFKA_BROKER)
		.withTopic(KAFKA_TOPIC)
		.withConsumerConfigUpdates(CONSUMER_CONFIG)
		.withKeyDeserializer(StringDeserializer.class)
		.withValueDeserializer(StringDeserializer.class))


您是否有有关KafkaIO Connector“go” SDK /库的可用性的信息?

任何帮助或信息将不胜感激。

谢谢。

最佳答案

@ cricket_007如果您也感到好奇,我收到了Apache Beam团队的Robert Burke(rebo@google.com)的以下更新:

There presently isn't a Kafka transform for Go. 

The Go SDK is still experimental, largely due to scalable IO support, which is why the Go SDK isn't represented in the built-in io page.

There's presently no way for an SDK user to write a Streaming source in the Go SDK, since there's no mechanism for a DoFn to "self terminate" bundles, such as to allow for scalability and windowing from streaming sources. 

However, SplittableDoFns are on their way, and will eventually be the solution for writing these.

At present, the Beam Go SDK IOs haven't been tested and vetted for production use. Until the initial SplittableDoFn support is added to the Go SDK, Batch transforms cannot split, and can't scale beyond a single worker thread. This batch version should land in the next few months, and the streaming version land a few months after that, after which a Kafka IO can be developed. 

I wish I had better news for you, but I can say progress is being made.

Robert Burke

关于go - KafkaIO连接器/Apache Beam Transform“运行” SDK是否可用?,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/59905546/

相关文章:

arrays - 语法错误 : unexpected name, 期待)

go - 将整数放入缓冲区并用零填充左边?

go - 在Go中将多个返回值转换/折叠为结构

apache-kafka - Kafka 段删除过于频繁或根本不删除

python - 在 apache beam DirectRunner 中使用 KafkaIO 时出错

date - 如何解析以下格式的日期/时间?

java - 如何将 Project Reactor 的调度程序与基于 Executor 的库一起使用?

java - Http 请求的 Spring Kafka 监听器

python - ReadFromKafka 抛出 ValueError : Unsupported signal: 2

apache-beam-io - 在 Apache Beam 中使用 PAssert containsInAnyOrder 比较对象