python - 提取具有多个值的 : Python dictionary, 键

我有两个文件，我正尝试从文件 1 中提取一些值，如下所示:

File1:
2    word1
4    word2
4    word2_1
4    word2_2
8    word5
8    word5_3

File 2:
4
8

我想要的是提取以 4 和 8 开头的每一行(来自文件 2)，它们很多。所以通常如果只有一行匹配，我会使用 python 字典，一个键一个元素很容易!但是现在我有多个元素与同一个键匹配，我的脚本只会提取最后一个(很明显，随着它的进行，它会删除以前的!)。所以我知道这不是它的工作原理，但我不知道，如果有人可以帮助我开始，我会很高兴。

这是我的“常用”代码:

gene_count = {}
my_file = open('file1.txt')
for line in my_file:
    columns = line.strip().split()
    gene = columns[0]
    count = columns[1:13]
    gene_count[gene] = count

names_file = open('file2.txt')
output_file = open('output.txt', 'w')

for line in names_file:
    gene = line.strip()
    count = gene_count[gene]
    output_file.write('{0}\t{1}\n'.format(gene,"\t".join(count)))

output_file.close()

最佳答案

创建字典、列表的值并附加到它们。

一般来说:

from collections import defaultdict
my_dict = defaultdict(lambda: [])

for x in xrange(101):
    if x % 2 == 0:
        my_dict['evens'].append(str(x))
    else:
        my_dict['odds'].append(str(x))

print 'evens:', ' '.join(my_dict['evens'])
print 'odds:', ' '.join(my_dict['odds'])

在您的情况下，您的值是列表，因此将列表添加(连接)到字典的列表中:

from collections import defaultdict
gene_count = defaultdict(lambda: [])

my_file = open('file1.txt')
for line in my_file:
    columns = line.strip().split()
    gene = columns[0]
    count = columns[1:13]
    gene_count[gene] += count

names_file = open('file2.txt')
output_file = open('output.txt', 'w')

for line in names_file:
    gene = line.strip()
    count = gene_count[gene]
    output_file.write('{0}\t{1}\n'.format(gene,"\t".join(count)))

output_file.close()

如果您实际想要打印的是每个基因的计数，则将 "\t".join(count) 替换为 len(count)，长度的列表。

关于python - 提取具有多个值的 : Python dictionary, 键，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/25638734/

python - 提取具有多个值的 : Python dictionary, 键

上一篇：python - cPickle.dump 总是在文件末尾转储

下一篇：python - 在 CentOS 6.5 上安装 Matplotlib