postgresql - kafka-connect-jdbc 不从源中获取连续的时间戳

标签 postgresql jdbc apache-kafka apache-kafka-connect confluent-platform

我使用 kafka-connect-jdbc-4.0.0.jar 和 postgresql-9.4-1206-jdbc41.jar

kafka connect的connector配置

{
  "connector.class": "io.confluent.connect.jdbc.JdbcSourceConnector",
  "mode": "timestamp",
  "timestamp.column.name": "updated_at",
  "topic.prefix": "streaming.data.v2",
  "connection.password": "password",
  "connection.user": "user",
  "schema.pattern": "test",
  "query": "select * from view_source",
  "connection.url": "jdbc:postgresql://host:5432/test?currentSchema=test"
}

我已经使用 jdbc 驱动程序针对 postgresql 数据库(“PostgreSQL 9.6.9”)配置了两个连接器,一个源和另一个接收器 一切正常

我对连接器如何收集源数据有疑问,查看日志我发现在执行查询之间有 21 秒的时间差

11/1/2019 9:20:18[2019-01-11 08:20:18,985] DEBUG Checking for next block of results from TimestampIncrementingTableQuerier{name='null', query='select * from view_source', topicPrefix='streaming.data.v2', timestampColumn='updated_at', incrementingColumn='null'} (io.confluent.connect.jdbc.source.JdbcSourceTask)
11/1/2019 9:20:18[2019-01-11 08:20:18,985] DEBUG TimestampIncrementingTableQuerier{name='null', query='select * from view_source', topicPrefix='streaming.data.v2', timestampColumn='updated_at', incrementingColumn='null'} prepared SQL query: select * from view_source WHERE "updated_at" > ? AND "updated_at" < ? ORDER BY "updated_at" ASC (io.confluent.connect.jdbc.source.TimestampIncrementingTableQuerier)
11/1/2019 9:20:18[2019-01-11 08:20:18,985] DEBUG executing query select CURRENT_TIMESTAMP; to get current time from database (io.confluent.connect.jdbc.util.JdbcUtils)
11/1/2019 9:20:18[2019-01-11 08:20:18,985] DEBUG Executing prepared statement with timestamp value = 2019-01-11 08:17:07.000 end time = 2019-01-11 08:20:18.985 (io.confluent.connect.jdbc.source.TimestampIncrementingTableQuerier)
11/1/2019 9:20:19[2019-01-11 08:20:19,070] DEBUG Resetting querier TimestampIncrementingTableQuerier{name='null', query='select * from view_source', topicPrefix='streaming.data.v2', timestampColumn='updated_at', incrementingColumn='null'} (io.confluent.connect.jdbc.source.JdbcSourceTask)

11/1/2019 9:20:49[2019-01-11 08:20:49,499] DEBUG Checking for next block of results from TimestampIncrementingTableQuerier{name='null', query='select * from view_source', topicPrefix='streaming.data.v2', timestampColumn='updated_at', incrementingColumn='null'} (io.confluent.connect.jdbc.source.JdbcSourceTask)
11/1/2019 9:20:49[2019-01-11 08:20:49,500] DEBUG TimestampIncrementingTableQuerier{name='null', query='select * from view_source', topicPrefix='streaming.data.v2', timestampColumn='updated_at', incrementingColumn='null'} prepared SQL query: select * from view_source WHERE "updated_at" > ? AND "updated_at" < ? ORDER BY "updated_at" ASC (io.confluent.connect.jdbc.source.TimestampIncrementingTableQuerier)
11/1/2019 9:20:49[2019-01-11 08:20:49,500] DEBUG executing query select CURRENT_TIMESTAMP; to get current time from database (io.confluent.connect.jdbc.util.JdbcUtils)
11/1/2019 9:20:49[2019-01-11 08:20:49,500] DEBUG Executing prepared statement with timestamp value = 2019-01-11 08:20:39.000 end time = 2019-01-11 08:20:49.500 (io.confluent.connect.jdbc.source.TimestampIncrementingTableQuerier)

第一个查询收集 08:17:07.000 和 08:20:18.985 之间的数据,但第二个查询收集 08:20:39.000 和 08:20:49.500 之间的数据..两者之间有 21 秒的差异可能会有记录...

11/1/2019 9:20:18[2019-01-11 08:20:18,985] DEBUG Executing prepared statement with timestamp value = 2019-01-11 08:17:07.000 end time = 2019-01-11 08:20:18.985 
11/1/2019 9:20:49[2019-01-11 08:20:49,500] DEBUG Executing prepared statement with timestamp value = 2019-01-11 08:20:39.000 end time = 2019-01-11 08:20:49.500 

我假设其中一个数据是获得的最后一条记录,另一个值是当时的时间戳

我找不到关于这个的解释 连接器是否正常工作? 您是否应该假设您不会总是收集所有信息?

最佳答案

JDBC 连接器不保证检索 消息。为此,您需要基于日志的更改数据捕获。对于由 Debezium 和 Kafka Connect 提供的 Postgres。 您可以阅读更多相关信息 here .

免责声明:我为 Confluent 工作,并撰写了上述博客

编辑:这也是 ApacheCon 2020 上上述博客的录音:🎥 https://rmoff.dev/no-more-silos

关于postgresql - kafka-connect-jdbc 不从源中获取连续的时间戳,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/54183118/

相关文章:

postgresql - 如何一次性将多行插入 postgresQL

postgresql - Postgres LIKE 唯一约束可能吗?

java - SQL 命令在 servlet 中未正确结束

java - Akka actor 可以参与 XA 交易吗?

java - spring-jdbc中一个连接可以同时被两个线程持有吗?

java - 是否可以只用一个查询来检索分成三个表的对象?

postgresql - 找不到符号 : _PQbackendPID with Django project

java - Kafka AdminClientConfig 忽略提供的配置

java - 通过 Apache Kafka 发送的 Python 处理的 Avro 格式数据在 Apache Camel/Java 处理器中进行反序列化时不会产生相同的输出

java - Storm KafkaSpout 失败元组重复