apache-flink - 弗林克 : ValueState vs ReducingState/AggregatingState

背景

ValueState文档指出:partitioned single-value state .
ReducingState文档指出:combined using a reduce function .
AggregatingState文档指出:eagerly pre-aggregated .
ValueState延伸State而两者ReducingState和AggregatingState延伸MergingState .

问题

各州何时合并？
我应该如何为特定问题选择正确的状态原语？
什么机制调用reduce和aggregate函数？它是否跳过非 MergingState ？

最佳答案

Fabian Hueske 不久前(2018 年 5 月 6 日)回答了我关于 AggregateFunctions 中合并的问题。他说:

The only situation when merge() is called in a DataStream job (that I am aware of) is when session windows get merged. For example when you define a session window with 30 minute gap and you receive the following records R1, 12:00:00 R2, 12:05:00 R3, 12:40:00 R4, 12:20:00

In this case, Flink R1 will create a new window W1, R2 will be assigned to W1, R3 > creates a new window W2, and R4 connects and merges W1 and W2.

我认为您其他问题的部分答案是 ValueState 是通用(键控)状态。因此，当您实现通用函数时，您最终会使用它，而不是聚合器或缩减器(带有组合器)。

关于apache-flink - 弗林克 : ValueState vs ReducingState/AggregatingState，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/57943469/

上一篇：angular - 401 错误代码未刷新刷新 token

下一篇：reactjs - Axios POST 404 未找到

java - Apache 弗林克 : Process data in order with mapPartition

apache-flink - Flink，使用多个Kafka源时如何正确设置并行性？

apache-flink - Apache Flink - 启用连接排序

apache-flink - 弗林克 : Stateful stream processing by key

java - 弗林克 : Extract array from a Row Field

java - Flink Avro 1.8.1 在集群上运行时出现 NoSuchMethodError

apache-flink - 找不到 org.apache.flink.streaming.api.scala.DataStream 的 Apache Flink 类文件

apache-spark - 在实践中，迷你批处理与实时流之间有什么区别(不是理论上的区别)？

python - TensorFlow Extended (TFX) : Clarify Beam, Airflow 和 Kubeflow 使用