基本上,我试图将 csv 文件中具有相同名称的项目列的计数列值加在一起。然后我需要按项目列值按字母升序对结果进行排序。例如:
Leading Cause, Deaths
Diabetes Mellitus, 123
Influenza and Pneumonia, 325
Diabetes Mellitus, 100
我需要将值 123 和 100 相加以获得糖尿病的新行。
它应该是这样的:
糖尿病,223
。
这是我目前的代码:
import csv
import sys
with open(sys.argv[1], 'r') as File:
reader = csv.reader(File)
itemindex = sys.argv[2]
countindex = sys.argv[3]
item column = 0
count column = 0
first row = True
dictionary = {}
for row in reader:
if firstrow == True:
firstrow = False
itemcolumn = row.index(itemindex)
countcolumn = row.index(countindex)
else:
if item column in dictionary:
# Add the item at the row's count column (converted to an int) to the
# prexisting entry with that item column.
else:
#create a new entry in the dictionary
print(itemindex + "," + countindex)
for key, value in sorted(dictionary)
print(key + "," + value)
评论部分是我坚持的部分。
最佳答案
如果您有权访问,您可以使用 pandas 包来处理 csv。
标题为 values.txt 的文本文件
Leading Cause, Deaths
Diabetes Mellitus, 123
Influenza and Pneumonia, 325
Diabetes Mellitus, 1008
所需的数据框可以通过以下方式实现:
import pandas as pd
data = pd.read_csv('values.txt')
print(data)
sum_data = data.groupby(['Leading Cause']).sum()
print(sum_data)
print(sum_data.loc['Diabetes Mellitus'])
哪个会输出
Leading Cause Deaths
0 Diabetes Mellitus 123
1 Influenza and Pneumonia 325
2 Diabetes Mellitus 1008
Deaths
Leading Cause
Diabetes Mellitus 1131
Influenza and Pneumonia 325
Deaths 1131
Name: Diabetes Mellitus, dtype: int64
关于python - 如何从 CSV 文件中每行中具有可变数量值的列中提取数据?,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/49265474/