我正在使用Apache Beam“go” SDK构建数据提取管道。
我的管道是使用来自Kafka队列的数据,并将数据持久保存到Google Cloud Bigtable(和/或另一个Kafka主题)。
到目前为止,我还找不到以“go”编写的Kafka IO连接器(也称为Apache I / O转换)(但是,我可以找到Java版本)。
以下是受支持的Apache Beam内置I / O转换的链接:
https://beam.apache.org/documentation/io/built-in/
我正在寻找以下Java代码的“go”等效项:
pipeline.apply("kafka_deserialization", KafkaIO.<String, String>read()
.withBootstrapServers(KAFKA_BROKER)
.withTopic(KAFKA_TOPIC)
.withConsumerConfigUpdates(CONSUMER_CONFIG)
.withKeyDeserializer(StringDeserializer.class)
.withValueDeserializer(StringDeserializer.class))
您是否有有关KafkaIO Connector“go” SDK /库的可用性的信息?
任何帮助或信息将不胜感激。
谢谢。
最佳答案
@ cricket_007如果您也感到好奇,我收到了Apache Beam团队的Robert Burke(rebo@google.com)的以下更新:
There presently isn't a Kafka transform for Go.
The Go SDK is still experimental, largely due to scalable IO support, which is why the Go SDK isn't represented in the built-in io page.
There's presently no way for an SDK user to write a Streaming source in the Go SDK, since there's no mechanism for a DoFn to "self terminate" bundles, such as to allow for scalability and windowing from streaming sources.
However, SplittableDoFns are on their way, and will eventually be the solution for writing these.
At present, the Beam Go SDK IOs haven't been tested and vetted for production use. Until the initial SplittableDoFn support is added to the Go SDK, Batch transforms cannot split, and can't scale beyond a single worker thread. This batch version should land in the next few months, and the streaming version land a few months after that, after which a Kafka IO can be developed.
I wish I had better news for you, but I can say progress is being made.
Robert Burke
关于go - KafkaIO连接器/Apache Beam Transform“运行” SDK是否可用?,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/59905546/