python - 从 python 中的 numpy 矩阵计算这两个元素对的两个元素的单独频率

标签 python numpy counter python-itertools

使用这个 numpy 矩阵:

>>> print matrix
[['L' 'G' 'T' 'G' 'A' 'P' 'V' 'I']
 ['A' 'A' 'S' 'G' 'P' 'S' 'S' 'G']
 ['A' 'A' 'S' 'G' 'P' 'S' 'S' 'G']
 ['G' 'L' 'T' 'G' 'A' 'P' 'V' 'I']]

我已经有了这段代码:

for i, j in itertools.combinations(range(len(matrix.T)), 2):
    c = matrix[:, [i,j]]
    counts = collections.Counter(map(tuple,c))
    print 'columns {} and {}'.format(i,j)
    for AB in counts:
      freq_AB = float(float(counts[AB])/len(c))
      print 'Frequency of {} = {}'.format(AB, freq_AB)
    print

产生

columns 0 and 1
Frequency of ('A', 'A') = 0.5
Frequency of ('G', 'L') = 0.25
Frequency of ('L', 'G') = 0.25

columns 0 and 2
Frequency of ('A', 'S') = 0.5
Frequency of ('G', 'T') = 0.25
Frequency of ('L', 'T') = 0.25

[...]

我想添加到代码中的是从列 i, j 的一对字母中检索每个字母在列 (i, j) 内的频率...我的意思是,输出类似于以下一项:

columns 0 and 1
Frequency of ('A', 'A') = 0.5
  Freq of 'A' in column 0 = 0.5
  Freq of 'A' in column 1 = 0.5
Frequency of ('G', 'L') = 0.25
  Freq of 'G' in column 0 = 0.25
  Freq of 'L' in column 1 = 0.25
Frequency of ('L', 'G') = 0.25
  Freq of 'L' in column 0 = 0.25
  Freq of 'G' in column 1 = 0.25

columns 0 and 2
Frequency of ('A', 'S') = 0.5
  Freq of 'A' in column 0 = 0.5
  Freq of 'S' in column 2 = 0.5
Frequency of ('G', 'T') = 0.25
  Freq of 'G' in column 0 = 0.25
  Freq of 'T' in column 2 = 0.5
Frequency of ('L', 'T') = 0.25
  Freq of 'L' in column 0 = 0.25
  Freq of 'T' in column 2 = 0.5

[...]

任何帮助将不胜感激

最佳答案

如何扩展相同的方法并这样做:

for i, j in itertools.combinations(range(len(matrix.T)), 2):
    c = matrix[:, [i,j]]
    combined_counts = collections.Counter(map(tuple,c))
    first_column_counts = collections.Counter(c[:,0])
    second_column_counts = collections.Counter(c[:,1])
    print 'columns {} and {}'.format(i,j)
    for AB in combined_counts:
      freq_AB = float(float(combined_counts[AB])/len(c))
      print 'Frequency of {} = {}'.format(AB, freq_AB)
      freq_A = float(first_column_counts[AB[0]])/len(c)
      print "  Freq of '{}' in column {} = {}".format(AB[0], i, freq_A)
      freq_B = float(second_column_counts[AB[1]])/len(c)
      print "  Freq of '{}' in column {} = {}".format(AB[1], i, freq_B)
    print

关于python - 从 python 中的 numpy 矩阵计算这两个元素对的两个元素的单独频率,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/15212838/

相关文章:

python - 将数据框列打包到 pandas 中列出

python - Numpy 数组上的分段函数

python - 如何将meshgrid的输出转换为对应的点数组?

python - 如何查找 pandas DataFrame 中所有行的字典总和?

javascript - 当您点击某个滚动点时的 animateNumber | jQuery

python - 如何为每个句子(行)创建标记化单词(列)的数据框?

python - 绘制排序的热图保持 (x,y) 值颜色

python - cx_Freeze - opencv 兼容性

python - 如何拟合闭合轮廓?

c++ - 在 CUDA 中有这样的可能吗