python - 从大列表中删除重复项，但如果确实存在，则同时删除它们？

所以我有这样一个文本文件

123
1234
123
1234
12345
123456

您可以看到 123 出现了两次，因此应该删除这两个实例。但是 12345 出现一次，所以它会保留下来。我的文本文件大约有 70,000 行。

这是我想出的。

file = open("test.txt",'r')
lines = file.read().splitlines() #to ignore the '\n' and turn to list structure
for appId in lines:
    if(lines.count(appId) > 1):  #if element count is not unique remove both elements
        lines.remove(appId)      #first instance removed
        lines.remove(appId)      #second instance removed


writeFile = open("duplicatesRemoved.txt",'a') #output the left over unique elements to file
for element in lines:
    writeFile.write(element + "\n")

当我运行它时，我觉得我的逻辑是正确的，但我知道输出应该在 950 左右，但我的输出中仍然有 23000 个元素，所以很多元素没有被删除。有什么想法可能存在错误吗？

编辑:我忘了提。一个元素最多只能出现两次。

最佳答案

使用内置集合中的Counter:

In [1]: from collections import Counter

In [2]: a = [123, 1234, 123, 1234, 12345, 123456]

In [3]: a = Counter(a)

In [4]: a
Out[4]: Counter({123: 2, 1234: 2, 12345: 1, 123456: 1})


In [5]: a = [k for k, v in a.items() if v == 1]

In [6]: a
Out[6]: [12345, 123456]

对于您的特定问题，我会这样做:

from collections import defaultdict
out = defaultdict(int)
with open('input.txt') as f:
    for line in f:
        out[line.strip()] += 1
with open('out.txt', 'w') as f:
     for k, v in out.items():
         if v == 1: #here you use logic suitable for what you want
             f.write(k + '\n')

关于python - 从大列表中删除重复项，但如果确实存在，则同时删除它们？，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/59203605/

python - 从大列表中删除重复项，但如果确实存在，则同时删除它们？

上一篇：python - 如何检查字符串是否包含 Python 列表中的任意 3 个元素

下一篇：python - 如何在 Pandas 中有效地为每个 groupby 组分配一个值