java - Java中两个异构系统的数据验证

由于从 RDBMS (oracle/teradata) 到 HDFS (HIVE) 的数据迁移，需要比较从 RDBMS 到 HIVE 数据集的完整数据集，我知道从 RDBMS/HIVE 带来大量数据是一个很大的网络开销，但是这就是要求，我在 eclipse 中开发了一个基本的 java 框架，它将获取源和目标查询(行数有限)并通过获取 RDBMS 和 HIVE 结果集进行并排比较，但是为了使其成为更全面的验证我必须比较两个系统的 key 并检查两个系统中的重复项，这是我到目前为止尝试过的事情:

Initialised two HashMaps one for RDBMS and one for HIVE then took PK as key and non-key attributes in a arraylist as value. Now with two hashmaps tried to compare the keys/values between it. But loading two resultsets and hashmaps in RAM would degrade the performance.

Tried to use REDIS in-memory database for storing Key/Value pairs however as I am trying to access REDIS through Java program not sure how to use REDIS hashmaps/hashsets the way we use in JAVA.

Wrote the resultsets into two different text files but writing the file and reading/processing is time consuming.

对于从 RDBMS 中获取数据的部分，我做了提到的事情 here和 here 我想可能有一些工具可以完成这项工作，但我正在尝试在开源中开发一些东西。

最佳答案

您的数据是否有时间戳或任何可用于对数据进行排序的递增值，或者来自一个数据源的重复元素是否可以在另一个数据源中的任何位置？如果有任何东西可以订购数据(如时间戳)，您可以使用任何类型的流系统和“简单”执行不同的选择。但是，需要有关您正在使用的数据类型的更多信息。

关于java - Java中两个异构系统的数据验证，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/38593607/

java - Java中两个异构系统的数据验证

上一篇：c# - 如何在保存到 Redis 之前压缩 JSON 数据？

下一篇：多 az netsplit 后具有 2 个主节点的 Redis Sentinel