sql - 计算 postgresql 矩阵中列的组合

标签 sql postgresql matrix combinations

我在 postgres 中有一个表,如下所示

table

我想要一个 postgres 中的 sql 来计算具有 YY 的 2 列的组合

期待这样的输出

组合计数

AB 2
AC 1
AD 2
AZ 1
BC 1
BD 3
BZ 2
CD 2
CZ 0
DZ 1

谁能帮帮我?

最佳答案

WITH stacked AS (
    SELECT id
        , unnest(array['A', 'B', 'C', 'D', 'Z']) AS col_name
        , unnest(array[a, b, c, d, z]) AS col_value
    FROM test t
)
SELECT combo, sum(cnt) AS count
FROM (
    SELECT t1.id, t1.col_name || t2.col_name AS combo
        , (CASE WHEN t1.col_value = 'Y' AND t2.col_value = 'Y' THEN 1 ELSE 0 END) AS cnt
    FROM stacked t1
    INNER JOIN stacked t2
    ON t1.id = t2.id
    AND t1.col_name < t2.col_name) t3
GROUP BY combo
ORDER BY combo

产量

| combo | count |
|-------+-------|
| AB    |     2 |
| AC    |     1 |
| AD    |     2 |
| AZ    |     2 |
| BC    |     1 |
| BD    |     3 |
| BZ    |     2 |
| CD    |     2 |
| CZ    |     0 |
| DZ    |     1 |

用于反透视表的unnesting 方法来自Stew's post, here .


要统计 3 列中 YYY 的出现次数,您可以使用:

WITH stacked AS (
    SELECT id
        , unnest(array['A', 'B', 'C', 'D', 'Z']) AS col_name
        , unnest(array[a, b, c, d, z]) AS col_value
    FROM test t
)
SELECT combo, sum(cnt) AS count
FROM (
    SELECT t1.id, t1.col_name || t2.col_name || t3.col_name AS combo
        , (CASE WHEN t1.col_value = 'Y' 
               AND t2.col_value = 'Y'
               AND t3.col_value = 'Y' THEN 1 ELSE 0 END) AS cnt
    FROM stacked t1
    INNER JOIN stacked t2
    ON t1.id = t2.id
    INNER JOIN stacked t3
    ON t1.id = t3.id
    AND t1.col_name < t2.col_name 
    And t2.col_name < t3.col_name
    ) t3
GROUP BY combo
ORDER BY combo
;

产生

| combo | count |
|-------+-------|
| ABC   |     0 |
| ABD   |     1 |
| ABZ   |     2 |
| ACD   |     1 |
| ACZ   |     0 |
| ADZ   |     1 |
| BCD   |     1 |
| BCZ   |     0 |
| BDZ   |     1 |
| CDZ   |     0 |

或者,要处理 N 列的组合,您可以使用 WITH RECURSIVE: 例如,对于 N = 3

WITH RECURSIVE result AS (
    WITH stacked AS (
        SELECT id
            , unnest(array['A', 'B', 'C', 'D', 'Z']) AS col_name
            , unnest(array[a, b, c, d, z]) AS col_value
        FROM test t)
    SELECT id, array[col_name] AS path, array[col_value] AS path_val, col_name AS last_name
    FROM stacked

    UNION

    SELECT r.id, path || s.col_name, path_val || s.col_value, s.col_name
    FROM result r
    INNER JOIN stacked s
    ON r.id = s.id
        AND s.col_name > r.last_name
    WHERE array_length(r.path, 1) < 3)  -- Change 3 to your value for N
SELECT combo, sum(cnt)
FROM (
    SELECT id, array_to_string(path, '') AS combo, (CASE WHEN 'Y' = all(path_val) THEN 1 ELSE 0 END) AS cnt
    FROM result
    WHERE array_length(path, 1) = 3) t  -- Change 3 to your value for N
GROUP BY combo
ORDER BY combo

请注意,N = 3 在上面的 SQL 中有 2 个地方使用。

关于sql - 计算 postgresql 矩阵中列的组合,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/54781815/

相关文章:

mysql - 用另一个 MySQL 表的值更新一个 MySQL 表

sql - T-SQL;每天从表中重复一个时间段的值

sql - SQL:通过删除左右字符来获取子字符串

sql - 在回退到默认值的树状表中选择数据的最佳方法

sql - 结合窗口函数和条件

r - 将日期数据帧转换为R中的时差对称矩阵

sql - 将时间戳转换为整数问题

c++ - 如何找到不同子矩阵的数量?

c++ - 矩阵指针语法

sql - postgres 的 "fetch_status"替代方案