google-bigquery - 如何将 Google Cloud SQL 与 Google Big Query 集成

我正在设计一个解决方案，其中 Google Cloud SQL 将用于存储来自应用程序正常运行的所有数据(一种 OLTP 数据)。随着时间的推移，数据预计会增长到相当大的规模。数据本身本质上是关系数据，因此我们选择了 Cloud SQL 而不是 Cloud Datastore。

这些数据需要输入 Big Query 进行分析，这需要接近实时分析(作为最好的情况)，尽管实际上可能会出现一些滞后。但我正在尝试设计一种解决方案，将这种延迟降至最低。

我的问题有 3 个部分 -

我应该使用 Cloud SQL 存储数据然后将其移至 BigQuery 还是更改基本设计本身并最初也使用 BigQuery 存储数据？ BigQuery 是否适合用于常规的、低延迟的 OLTP 工作负载？(我不这么认为 - 我的假设正确吗？)

将 Cloud SQL 数据加载到 BigQuery 并使这种集成近乎实时地工作的推荐/最佳做法是什么？

Cloud Dataflow 是一个不错的选择吗？如果我将 Cloud SQL 连接到 Cloud DataFlow 并进一步连接到 BigQuery - 它会起作用吗？或者有没有其他更好的方法来实现这一点(如问题2中所问)？

最佳答案

看看 WePay 如何做到这一点:

https://wecode.wepay.com/posts/bigquery-wepay

The MySQL to GCS operator executes a SELECT query against a MySQL table. The SELECT pulls all data greater than (or equal to) the last high watermark. The high watermark is either the primary key of the table (if the table is append-only), or a modification timestamp column (if the table receives updates). Again, the SELECT statement also goes back a bit in time (or rows) to catch potentially dropped rows from the last query (due to the issues mentioned above).

借助 Airflow，他们设法让 BigQuery 每 15 分钟与 MySQL 数据库同步一次。

关于google-bigquery - 如何将 Google Cloud SQL 与 Google Big Query 集成，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/46369952/

google-bigquery - 如何将 Google Cloud SQL 与 Google Big Query 集成

上一篇：cocoapods - 如何在podfile中使用source？

下一篇：arrays - 在非空 WrappedArray 上过滤数据框