mysql - 在 MySQL 中计算重叠

标签 mysql sql circos

我试图找出哪些类之间重叠最多。数据存储在 MySQL 中,每个学生在数据库中对于他/她参加的每门类(class)都有一个完全独立的行(我没有配置它,也无法更改它)。我粘贴了下表的简化版本。事实上,大约有 20 门不同的类(class)。

CREATE TABLE classes
(`student_id` int, `class` varchar(13));
INSERT INTO classes
(`student_id`, `class`)
VALUES
(55421, 'algebra'),
(27494, 'algebra'),
(64934, 'algebra'),
(65364, 'algebra'),
(21102, 'algebra'),
(90734, 'algebra'),
(20103, 'algebra'),
(57450, 'gym'),
(76411, 'gym'),
(24918, 'gym'),
(65364, 'gym'),
(55421, 'gym'),
(89607, 'world_history'),
(54522, 'world_history'),
(49581, 'world_history'),
(84155, 'world_history'),
(55421, 'world_history'),
(57450, 'world_history');

我最终想使用 Circos ( background here ),但我很乐意使用任何能让我理解并向人们展示哪里重叠最多和最少的方法。这超出了我的想象,但我想我可以使用一个输出表,其中每门类(class)包含一行和一列,并列出不同类别相交处的重叠数。每门类(class)与其自身相交的地方可以显示与任何其他类别没有重叠的人数。

Screenshot of a 3x3 matrix from Excel

最佳答案

您可以通过生成表示链接的结果来做到这一点:src -> dst = nb

1) 获取矩阵

select c1.class src_class, c2.class dst_class
from (select distinct class from classes) c1
join (select distinct class from classes) c2
order by src_class, dst_class

“选择不同类”不是生成矩阵所必需的,您可以直接选择类并进行 GROUP BY 。但是,在第 2 步,我们需要独特的结果。

结果:

src_class      dst_class
-----------------------------
algebra        algebra
algebra        gym
algebra        world_history
gym            algebra
gym            gym
gym            world_history
world_history  algebra
world_history  gym
world_history  world_history

2) 加入与源和目的地匹配的学生列表

select c1.class src_class, c2.class dst_class, count(v.student_id) overlap
from (select distinct class from classes) c1
join (select distinct class from classes) c2
left join classes v on
(
    v.class = c1.class
    and v.student_id in (select student_id from classes
                         where class = c2.class)
)
group by src_class, dst_class
order by src_class, dst_class

不同的值(步骤 1)允许我们获取所有类,即使它们不是链接(并改为 0)。

结果:

src_class      dst_class      overlap
-------------------------------------
algebra        algebra           7
algebra        gym               2
algebra        world_history     1
gym            algebra           2
gym            gym               5
gym            world_history     2
world_history  algebra           1
world_history  gym               2
world_history  world_history     6

3 - 如果类相等,则进行不同的计算

select c1.class src_class, c2.class dst_class, count(v.student_id) overlap
from (select distinct class from classes) c1
join (select distinct class from classes) c2
left join classes v on
(
    v.class = c1.class and
    (
        -- When classes are equals
        -- Students presents only in that class
        (c1.class = c2.class
         and 1 = (select count(*) from classes
                  where student_id = v.student_id))
    or
        -- When classes are differents
        -- Students present in both classes
        (c1.class != c2.class
         and v.student_id in (select student_id from classes
                              where class = c2.class))
    )
)
group by src_class, dst_class
order by src_class, dst_class

结果:

src_class      dst_class      overlap
-------------------------------------
algebra        algebra           5
algebra        gym               2
algebra        world_history     1
gym            algebra           2
gym            gym               2
gym            world_history     2
world_history  algebra           1
world_history  gym               2
world_history  world_history     4

关于mysql - 在 MySQL 中计算重叠,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/37148559/

相关文章:

mysql - 如果条件为 1,如何对所有条目求和

mysql - 平均维修间隔时间(按单位)

r - R 中的网络和弦图问题

r - 在 ggplot 中圈出圆形条形图

mysql - 为什么sql中不能插入多行数据?

带有 'flexible' WHERE 子句和 % 匹配的 Mysql 请求

mysql - 在 View 中格式化 mysql 十进制

mysql - 如何在 NodeJS 中使用 for 循环插入 SQL 表?

r - Circlize 迁移图缺少链接

sql - 列到 MySql 中行的简单枢轴