python - 从多个 CSV 文件中提取信息，用第三列写入新的 CSV

我有一个包含四个 CSV 文件的文件夹。在每个 CSV 中都有动物，以及每只动物的多次出现。我正在尝试创建一个 CSV，它从文件夹中的所有 CSV 中收集信息，删除重复项，并添加第三列，其中列出了发现该动物的原始文件。例如 lion,4 ,'file2, file4'

我真的很希望我的新 CSV 有第三列，列出哪些文件包含每只动物，但我想不出来。我试着用第二本字典来做——引用带有 locationCount 的行。在下面查看我正在使用的当前脚本。

我的文件:

file1.csv:
cat,1
dog,2
bird,1
rat,3

file2.csv:
bear,1
lion,1
goat,1
pig,1

file3.csv:
rat,1
bear,1
mouse,1
cat,1

file4.csv:
elephant,1
tiger,2
dog,1
lion,3

当前脚本:

import glob
import os
import csv, pdb

listCSV = glob.glob('*.csv')
masterCount = {}
locationCount = {}
for i in listCSV: # iterate over each csv
    filename = os.path.split(i)[1] # filename for each csv
    with open(i, 'rb') as f:
        reader = csv.reader(f)
        location = []
        for row in reader:
            key = row[0]
            location.append(filename)
            masterCount[key] = masterCount.get(key, 0) + int(row[1]) 
            locationCount[key] = locationCount.get(key, location)
writer = csv.writer(open('MasterAnimalCount.csv', 'wb'))
for key, value in masterCount.items():
    writer.writerow([key, value])

最佳答案

您几乎是对的 - 以与处理计数相同的方式处理位置。

我已经重新命名和改组了一些东西，但它基本上是相同的代码结构。 masterCount 将一个数字添加到之前的数字中，masterLocations 将文件名添加到之前的文件名列表中。

from glob import glob
import os, csv, pdb

masterCount = {}
masterLocations = {}

for i in glob('*.csv'):
    filename = os.path.split(i)[1]

    for animal, count in csv.reader(open(i)):
        masterCount[animal] = masterCount.get(animal, 0) + int(count) 
        masterLocations[animal] = masterLocations.get(animal, []) + [filename]

writer = csv.writer(open('MasterAnimalCount.csv', 'wb'))

for animal in masterCount.keys():
    writer.writerow([animal, masterCount[animal], ', '.join(masterLocations[animal])])

关于python - 从多个 CSV 文件中提取信息，用第三列写入新的 CSV，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/26987142/

python - 从多个 CSV 文件中提取信息，用第三列写入新的 CSV

上一篇：python - 使用二维数组创建可点击的 TKinter Canvas

下一篇：python - 通过两列连接两个数组并删除不需要的部分 PYTHON