python - 如何从 CSV 文件中每行中具有可变数量值的列中提取数据？

基本上，我试图将 csv 文件中具有相同名称的项目列的计数列值加在一起。然后我需要按项目列值按字母升序对结果进行排序。例如:

Leading Cause, Deaths
Diabetes Mellitus, 123
Influenza and Pneumonia, 325
Diabetes Mellitus, 100

我需要将值 123 和 100 相加以获得糖尿病的新行。

它应该是这样的:
糖尿病，223。

这是我目前的代码:

import csv
import sys

with open(sys.argv[1], 'r') as File:
    reader = csv.reader(File)
    itemindex = sys.argv[2]
    countindex = sys.argv[3]
    item column = 0
    count column = 0
    first row = True
    dictionary = {}

    for row in reader:
       if firstrow == True:
          firstrow = False
          itemcolumn = row.index(itemindex)
          countcolumn = row.index(countindex)
       else:
           if item column in dictionary:
               # Add the item at the row's count column (converted to an int) to the
               # prexisting entry with that item column.
           else:
               #create a new entry in the dictionary

       print(itemindex + "," + countindex)

for key, value in sorted(dictionary)
    print(key + "," + value)

评论部分是我坚持的部分。

最佳答案

如果您有权访问，您可以使用 pandas 包来处理 csv。

标题为 values.txt 的文本文件

Leading Cause, Deaths
Diabetes Mellitus, 123
Influenza and Pneumonia, 325
Diabetes Mellitus, 1008

所需的数据框可以通过以下方式实现:

import pandas as pd

data = pd.read_csv('values.txt')
print(data)

sum_data = data.groupby(['Leading Cause']).sum()
print(sum_data)

print(sum_data.loc['Diabetes Mellitus'])

哪个会输出

             Leading Cause   Deaths
0        Diabetes Mellitus      123
1  Influenza and Pneumonia      325
2        Diabetes Mellitus     1008

                          Deaths
Leading Cause                   
Diabetes Mellitus           1131
Influenza and Pneumonia      325

 Deaths    1131
Name: Diabetes Mellitus, dtype: int64

关于python - 如何从 CSV 文件中每行中具有可变数量值的列中提取数据？，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/49265474/

python - 如何从 CSV 文件中每行中具有可变数量值的列中提取数据？

上一篇：python - 使用 np.where 根据条件在 pandas df 中创建一个新列

下一篇：python - 导入 google.cloud.datastore 时为 "ImportError: No module named pkg_resources"