python - 计算CSV文件中特定列中的重复值并将该值返回到另一列(python2)

我目前正在尝试计算 CSV 文件列中的重复值，并将该值返回到 python 中的另一个 CSV 列。

例如，我的 CSV 文件:

KeyID    GeneralID
145258   KL456
145259   BG486
145260   HJ789
145261   KL456

我想要实现的是计算有多少数据具有相同的 GeneralID 并将其插入到新的 CSV 列中。例如，

KeyID    Total_GeneralID
145258   2
145259   1
145260   1
145261   2

我曾尝试使用 split 方法拆分每一列，但效果不佳。

我的代码:

case_id_list_data = []

with open(file_path_1, "rU") as g:
    for line in g:
        case_id_list_data.append(line.split('\t'))
        #print case_id_list_data[0][0] #the result is dissatisfying 
        #I'm stuck here..

最佳答案

如果您不喜欢 pandas 并希望继续使用标准库:

代码:

import csv
from collections import Counter
with open('file1', 'rU') as f:
    reader = csv.reader(f, delimiter='\t')
    header = next(reader)
    lines = [line for line in reader]
    counts = Counter([l[1] for l in lines])

new_lines = [l + [str(counts[l[1]])] for l in lines]
with open('file2', 'wb') as f:
    writer = csv.writer(f, delimiter='\t')
    writer.writerow(header + ['Total_GeneralID'])
    writer.writerows(new_lines)

结果:

KeyID   GeneralID   Total_GeneralID
145258  KL456   2
145259  BG486   1
145260  HJ789   1
145261  KL456   2

关于python - 计算CSV文件中特定列中的重复值并将该值返回到另一列(python2)，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/43602222/

上一篇：python - xtol 在 minimize(method='Nelder-Mead') 中的作用是什么？

下一篇：python - Django rest framework，设置api响应Content-Encoding为gzip

相关文章：

python - 使用 pd.read_csv 时跳过日期不正确的行

java - 将非柱状文本文件转换为柱状 CSV/Excel

python - 如何在 Python 中使用 XPath？

python - 仅幅度重建看起来不正确，我的解释正确吗？

python - GAE 在哪里存储永久链接？

javascript - Flask 服务器发送事件套接字异常

python - 如何找到所有出现的子字符串？

python - 如何将 None 值 append 到 Python 中的列表？

java - 使用 Apache POI 导入 CSV 数据

Python ascii utf unicode