sql - 如何为 "group by data from other rows"算法生成测试数据

更新:我正在寻找一种技术来为我的算法(或与此相关的任意算法)的所有边缘情况计算数据。
我到目前为止只是在考虑什么可能是边缘情况+生成一些“随机”数据，但我不知道如何才能更确定我没有错过真正用户会的东西有能力搞砸..

我想检查我没有遗漏算法中的重要内容，我不知道如何生成测试数据以涵盖所有可能的情况:

任务是报告每个Event_Date的数据快照, 但为可能属于 下一个 Event_Date 的编辑创建一个单独的行 - 请参阅第 2 组)关于输入和输出的数据说明:

input and output data illustration

我的算法:

列出event_date s 和计算 next_event_date为他们
将结果加入 main_audit_table并计算最大的 transaction_id对于每个快照(我的插图中的第 1-4 组)- 按 id 分组, event_date并根据是否 transaction_date < next_event_date 提供 2 个选项是真是假
加入main_audit_table到结果以从相同的 transaction_id 获取其他数据
加入costs_audit_table结果 - 使用最大的 transaction_id小于 transaction_id从结果

我的问题:

如何生成涵盖所有可能场景的测试数据，以便我知道我的算法是正确的？
你能看出我的算法逻辑有什么错误吗？
是否有更好的论坛来解决此类问题？

我的代码(需要测试):

select
    snapshots.id,
    snapshots.event_date,
    main.event,
    main.transaction_date as last_change,
    costs.costs as costs_2012
  from (
    --snapshots that return correct transaction ids grouped by event_date
    select
      main_grp.id,
      main_grp.event_date,
      max(main_grp.transaction_id) main_transaction_id,
      max(costs_grp.transaction_id) costs_transaction_id
    from main_audit_table main_grp
    join (
      --list of all event_dates and their next_event_dates
      select
        id,
        event_date,
        coalesce(lead(event_date) over (partition by id order by event_date),
                 '1.1.2099') next_event_date
      from main_audit_table
      group by main_grp.id, main_grp.event_date
    ) list on list.id = main_grp.id and list.event_date = main_grp.event_date
    left join costs_audit_table costs_grp
      on costs_grp.id = main_grp.id and
         costs_grp.year = 2012 and
         costs_grp.transaction_id <= main_grp.transaction_id
    group by
      main_grp.id,
      main_grp.event_date,
      case when main_grp.transaction_date < list.next_event_date
           then 1
           else 0 end
  ) snapshots
  join main_audit_table main
    on main.id = snapshots.id and
       main.transaction_id = snapshots.main_transaction_id
  left join costs_audit_table costs
    on costs.id = snapshots.id and
       costs.transaction_id = snapshots.costs_transaction_id

最佳答案

公用表表达式 (CTE) 不仅是隐藏复杂性和减少较长 SQL 片段重复的好方法，也是一种将测试数据表示为就好像它来自永久表一样的简单方法。至少，CTE 会将查询的主要部分集中在顶部，允许您在整个语句的其余部分通过它们的标签来引用它们。 Graeme Birchall 的 DB2 SQL Cookbook (一本维护良好的免费电子书)有一些关于此模式和其他高级 SQL 模式的好例子。 Joe Celko 是关于如何让 SQL 为您完成更多繁重工作的想法的另一个很好的来源。

关于sql - 如何为 "group by data from other rows"算法生成测试数据，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/10305396/

sql - 如何为 "group by data from other rows"算法生成测试数据

上一篇：algorithm - 遍历最多 k 位 ON 的整数的最佳方法是什么？

下一篇：algorithm - Lowe 如何计算他的 SIFT 算法的 “repeatability”？