sql - 查询统计多对多关联的频率

我在 postgresql 中有两个具有多对多关联的表。第一个表包含事件，可能会计数零个或多个原因:

CREATE TABLE activity (
   id integer NOT NULL,
   -- other fields removed for readability
);

CREATE TABLE reason (
   id varchar(1) NOT NULL,
   -- other fields here
);

为了执行关联，连接表存在于这两个表之间:

CREATE TABLE activity_reason (
   activity_id integer NOT NULL, -- refers to activity.id
   reason_id varchar(1) NOT NULL, -- refers to reason.id
   CONSTRAINT activity_reason_activity FOREIGN KEY (activity_id) REFERENCES activity (id),
  CONSTRAINT activity_reason_reason FOREIGN KEY (reason_id) REFERENCES reason (id)
);

我想统计事件和原因之间可能存在的关联。假设我在表 activity_reason 中有这些记录:

+--------------+------------+
| activity_id  |  reason_id |
+--------------+------------+
|           1  |          A |
|           1  |          B |
|           2  |          A |
|           2  |          B |
|           3  |          A |
|           4  |          C |
|           4  |          D |
|           4  |          E |
+--------------+------------+

我应该有这样的东西:

+-------+---+------+-------+
| count |   |      |       |
+-------+---+------+-------+
|     2 | A | B    | NULL  |
|     1 | A | NULL | NULL  |
|     1 | C | D    | E     |
+-------+---+------+-------+

或者，最终，类似的东西:

+-------+-------+
| count |       |
+-------+-------+
|     2 | A,B   |
|     1 | A     |
|     1 | C,D,E |
+-------+-------+

我找不到执行此操作的 SQL 查询。

最佳答案

我想你可以使用这个查询得到你想要的:

SELECT count(*) as count, reasons
FROM (
  SELECT activity_id, array_agg(reason_id) AS reasons
  FROM (
    SELECT A.activity_id, AR.reason_id
    FROM activity A
    LEFT JOIN activity_reason AR ON AR.activity_id = A.activity_id
    ORDER BY activity_id, reason_id
  ) AS ordered_reasons
  GROUP BY activity_id
) reason_arrays
GROUP BY reasons

首先，您将事件的所有原因聚合到每个事件的数组中。您必须首先对关联进行排序，否则 ['a','b'] 和 ['b','a'] 将被视为不同的集合并且将具有单独的计数。您还需要包括连接，否则任何没有任何原因的事件都不会显示在结果集中。我不确定这是否可取，如果您想要没有理由不包括在内的事件，我可以将其撤回。然后计算具有相同原因集的事件的数量。

这是一个sqlfiddle演示

正如 Gordon Linoff 所提到的，您还可以使用字符串而不是数组。我不确定哪个对性能更好。

关于sql - 查询统计多对多关联的频率，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/36439410/

sql - 查询统计多对多关联的频率

上一篇：sql - PostgreSQL 表的测试数据

下一篇：postgresql - 如果任何列在 PostgreSQL 全文搜索中没有数据，则 to_tsvector 为空