python 在 csv 中找到重复项并删除最旧的

我有一个包含这些类型条目的 csv 文件，但没有标题

abcd,123,2017-09-27 17:38:38
cdfg,324,2017-09-27 18:38:38
abcd,123,2017-09-27 19:38:38
cdfg,423,2017-09-27 16:38:38

我想在第一列中找到重复项，它应该根据日期时间格式的第三列删除旧条目吗？

结果应该是:

abcd,123,2017-09-27 19:38:38
cdfg,423,2017-09-27 16:38:38

有什么想法吗？

最佳答案

使用 csv模块是标准库的一部分，你可以这样做:

import csv
from collections import OrderedDict  
# you can use a normal dict if the order of the rows does not matter

with open('file.csv') as f:
  r = csv.reader(f)
  d = OrderedDict()
  for row in r:
    if row[0] not in d or d[row[0]][2] < row[2]:
      d[row[0]] = row
d.values()
# [['cdfg', '324', '2017-09-27 18:38:38'], ['abcd', '123', '2017-09-27 19:38:38']]

with open('file_out.csv', 'w') as f:
  w = csv.writer(f)
  w.writerows(d.values())

关于python 在 csv 中找到重复项并删除最旧的，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/46466761/

上一篇：linux - bash 用特定索引替换文件中的字符串

下一篇：c - LINUX IPC : Passing different values to CHILD process using PIPE not working, 除了第一次，在 C 中

相关文章：

python - 在 Python 中对序列进行排序的最佳方法是什么？

python - 使用nosetests 特别命名的目录

Python 最佳实践 : Whether to use subprocess/fabric calls or use a Linux script?

linux - 有没有办法让非 root 进程绑定(bind)到 Linux 上的 "privileged"端口？

linux - 如何观察和报告特定目录中的文件创建？

Python mysql.connector 模块，将数据传递到字符串 VALUES %s

python - 使用Python和SQLite创建表，没有这样的表

Python打开文件过多(子进程)

c - 对大页面使用 mmap 和 madvise

linux - 比较 2 个 jar 文件中的文件名？