我有一个单词集及其频率的数据集,例如
w1 w2 w3 freq
a a a 4
a a and 3
a a band 1
a a well 1
a and a 2
我想根据下表获取观测值:
(w3) not(w3)
(w1,w2) n1 n2
not(w1,w2) n3 n4
其中n1,...,n4是满足条件的观测频率的总和。例如,在第一个观察中,w1 = a,w2 = a,w3 = a。现在,我们将检查其中w1 = a,w2 = a,w3 = a的所有观察值。我们发现只有一个观测值满足该条件,其频率为4。接下来,我们做w1 = a,w2 = a,w3!= a,得出的观测值的频率为3,1,1,总和为5。现在我们将做w1!= a,w2!= a,w3 = a为0且w1!= a,w2!= a,w3!= a为0。
我想要一个表,将其输出为:
w1 w2 w3 freq n1 n2 n3 n4
a a a 4 4 5 0 0
a a and 3 3 6 0 0
a a band 1
a a well 1
a and a 2
etc.
如何使用sqlite3做到这一点?
最佳答案
这可以通过相关的标量子查询来完成:
SELECT w1,
w2,
w3,
freq,
(SELECT SUM(freq)
FROM MyLittleTable AS T2
WHERE T2.w1 = T1.w1
AND T2.w2 = T1.w2
AND T2.w3 = T1.w3
) AS n1,
(SELECT SUM(freq)
FROM MyLittleTable AS T2
WHERE T2.w1 = T1.w1
AND T2.w2 = T1.w2
AND T2.w3 != T1.w3
) AS n2,
...
FROM MyLittleTable AS T1
关于sql - 计算sqlite中特征的出现,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/26152506/