我有一个如下所示的数据文件:
TOPIC:topic_0 2056
ab 2.0
cd 5.0
ef 3.0
gh 10.0
TOPIC:topic_1 1000
aa 3.0
bd 5.0
gh 2.0
等等......直到TOPIC:topic_2000。第一行是主题及其权重。也就是说,我有那个特定主题中的单词及其各自的权重。
现在,我想总结每个主题的第二列并检查它给出的值。也就是说,我希望得到如下输出:
Topic:topic_0 20
Topic:topic_1 10
即topic number和column value之和(即在topic 1中,词的权重为2,5,3,10)。我尝试使用:
with open('Input.txt') as in_file:
for line in in_file:
columns = line.split(' ')
value = columns[0]
if value[:6] == 'TOPIC:':
total_value = columns[1]
total_value = total_value[:-1]
total_values = float(total_value)
#print '\n'
print columns[0]
但是,我不确定如何从这里着手。这只是打印主题编号。请帮忙!
最佳答案
import re
input = """
TOPIC:topic_0 2056
ab 2.0
cd 5.0
ef 3.0
gh 10.0
TOPIC:topic_1 1000
aa 3.0
bd 5.0
gh 2.0
"""
result = {}
for line in input.splitlines():
line = line.strip()
if not line:
continue
columns = re.split(r"\s+", line)
value = columns[0]
if value[:6] == 'TOPIC:':
result[value] = []
points = result[value]
continue
points.append(float(columns[1]))
for k, v in result.items():
print k, sum(v)
关于python - 汇总文本文件中的列,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/33572311/