我是 python 的新手(任何类型的编码)。所以如果有点困惑我很抱歉
我有一个如下所示的 csv 文件
A B C D E F G H
14 BP1 BP1-19119308 OR1A1 19119308 chip-chip Hs578T human 11/23/09
15 BP1 BP1-19119308 PTPRE 19119308 chip-chip Hs578T human 11/23/09
16 BP1 BP1-19119308 SELE 19119308 chip-chip Hs578T human 11/23/09
17 BP1 BP1-19119308 TAC3 19119308 chip-chip Hs578T human 11/23/09
18 BP1 BP1-19119308 VEGFA 19119308 chip-chip Hs578T human 11/23/09
19 CHD7 CHD7-19251738 APOA1 19251738 chip-chip MESC mouse 11/23/09
20 CHD7 CHD7-19251738 ARHGAP26 19251738 chip-chip MESC mouse 11/23/09
我需要让它看起来像这样
BP1-19119308-chip-chip-Hs578T-human OR1A1 PTPRE SELE TAC3 VEGFA
CHD7-19251738-chip-chip-MESC-mouse APOA1 ARHGAP26
我确实设法在第一列中使用了 C-F-G-H
import csv
out = open ('test.csv','rt', encoding='utf8')
data = csv.reader(out)
output = csv.writer(out)
data = [row for row in data]
new_data = [[row[2]+'-'+row[5]+'-'+row[6] +'-'+ row[7], row[3]] for row in data]
print (new_data)
out = open('new_data.csv','wt')
output = csv.writer(out)
for row in new_data:
output.writerow(row)
out.close()
A B
BP1-19119308-chip-chip-Hs578T-human OR1A1
BP1-19119308-chip-chip-Hs578T-human PTPRE
BP1-19119308-chip-chip-Hs578T-human SELE
BP1-19119308-chip-chip-Hs578T-human TAC3
BP1-19119308-chip-chip-Hs578T-human VEGFA
CHD7-19251738-chip-chip-MESC-mouse APOA1
CHD7-19251738-chip-chip-MESC-mouse ARHGAP26
CHD7-19251738-chip-chip-MESC-mouse ATP11A
但现在我在 A 中有这些重复项,但我不知道如何删除它们并转置 B 中分配给这些重复项的所有值。
我尝试再次循环以将当前值与先前值进行比较,但我把整个事情搞砸了。
有什么建议吗?
最佳答案
您想使用字典。如果您正在进行进一步分析,请将聚合值保存在每个标识符的列表中。您的标识符字符串是一个键,在每个键下,您都有一个值列表。
new_keys = [row[2] + '-' + row[5] + '-' + row[6] + '-' + row[7] for row in data]
new_values = [row[3] for row in data]
aggregate_values = {} # An empty dictionary
# Iterate across the paired lists together
for key, value in zip(new_keys, new_values):
if key not in aggregate_values:
aggregate_values[key] = [value]
else:
aggregate_values[key].append(value)
# printed output
for key in aggregate_values:
print key + " " + " ".join(aggregate_values[key])
关于python - 迭代并丢弃顺序重复项,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/33456550/