我有一个包含以下矩阵的 csv 文件
A1 A2 A3 A4
B1 0.2 0.3 0.7 .5
B2 0.5 0.55 0.4 0.6
B3 0.9 0.13 0.5 0.16
B4 0.2 0.4 0.6 0.8
我希望输出值大于 0.5,格式如下
A1 B2 B3
A2 B2
A3 B1 B3 B4
如下,请帮助我。
这是我尝试过的
import csv
ifile = open('gene.matrix.csv', "rb")
reader = csv.reader(ifile)
rownum = 0
for row in reader:
# Save header row.
if rownum == 0:
header = row
else:
colnum = 0
for col in row:
print '%-8s: %s' % (header[colnum], col)
colnum += 1
rownum += 1
ifile.close()
最佳答案
或者,如果您有 pandas ,索引/列应该很容易获得:
In [2]: import pandas as pd
# df = pd.read_csv('gene.matrix.csv', delimiter='\s+')
In [3]: df = pd.read_clipboard() # from your sample
# simply do "df >= 0.5" can locate the values
# .T is just doing a transpose for the correct index/column you expect
# stack() to Pivot a level of the (possibly hierarchical) column labels
In [4]: groups = df[df >= 0.5].T.stack()
In [5]: groups
Out[5]:
A1 B2 0.50
B3 0.90
A2 B2 0.55
A3 B1 0.70
B3 0.50
B4 0.60
A4 B1 0.50
B2 0.60
B4 0.80
dtype: float64
获得所需输出的一种方法:
# store required output into a dict key/value list
In [6]: result = {}
In [7]: for i in groups.index:
...: if i[0] in result:
...: result[i[0]].append(i[1])
...: else:
...: result[i[0]] = [i[1]]
...:
In [8]: result
Out[8]:
{'A1': ['B2', 'B3'],
'A2': ['B2'],
'A3': ['B1', 'B3', 'B4'],
'A4': ['B1', 'B2', 'B4']}
# to print the expected output... note dict is unordered (you can use OrderedDict)
In [9]: for k, v in result.items():
...: print k, " ".join(v)
...:
A1 B2 B3
A3 B1 B3 B4
A2 B2
A4 B1 B2 B4
<小时/>
编辑:
要将结果逐行写入文本文件,只需执行以下操作:
with open("output.csv", "w") as f:
for k, v in result.items():
f.write("%s %s\n" % (k, " ".join(v)))
可能我在你的示例中遇到了过于复杂的事情,但这肯定是实现的一种方法。
关于python - csv 文件按列打印大于 0.5 的原始值,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/32399920/