python - 从元组列表中删除重复项

标签 python

我有一个不幸包含重复的元组列表,如下所示:

[(67, u'top-coldestcitiesinamerica'), (66, u'ecofriendlyideastocelebrateindependenceday-phpapp'), (65, u'a-b-c-ca-d-ab-ea-d-c-c'), (64, u'a-b-c-ca-d-ab-ea-d-c-c'), (63, u'alexandre-meybeck-faowhatisclimate-smartagriculture-backgroundopportunitiesandchallenges'), (62, u'ghgemissions'), (61, u'top-coldestcitiesinamerica'), (58, u'infographicthe-stateofdigitaltransformationaltimetergroup'), (57, u'culture'), (55, u'cas-k-ihaveanidea'), (54, u'trendsfor'), (53, u'batteryimpedance'), (52, u'evs-howey-full'), (51, u'bericht'), (49, u'classiccarinsurance'), (47, u'uploaded_file'), (46, u'x_file'), (45, u's-s-main'), (44, u'vehicle-propulsion'), (43, u'x_file')]

问题是元组的第一个元素(基于 0 的排序)是我要检查重复项的条目。所以,我可以看到:

(67, u'top-coldestcitiesinamerica')
(61, u'top-coldestcitiesinamerica')

..是重复项,我想删除其中一个(类似于)。所以,最后,我想要一个干净的元组列表,没有像这样的重复项(即元组的第一个元素没有重复项):

[(67, u'top-coldestcitiesinamerica'), (66, u'ecofriendlyideastocelebrateindependenceday-phpapp'), (65, u'a-b-c-ca-d-ab-ea-d-c-c') (63, u'alexandre-meybeck-faowhatisclimate-smartagriculture-backgroundopportunitiesandchallenges'), (62, u'ghgemissions'), (58, u'infographicthe-stateofdigitaltransformationaltimetergroup'), (57, u'culture'), (55, u'cas-k-ihaveanidea'), (54, u'trendsfor'), (53, u'batteryimpedance'), (52, u'evs-howey-full'), (51, u'bericht'), (49, u'classiccarinsurance'), (47, u'uploaded_file'), (46, u'x_file'), (45, u's-s-main'), (44, u'vehicle-propulsion')]

我怎样才能以 pythonic 方式实现这一点? 谢谢!

最佳答案

您可以使用 How do you remove duplicates from a list in whilst preserving order? 中的 set 方法,使用 x[1] 作为唯一标识符:

def unique_second_element(seq):
    seen = set()
    seen_add = seen.add
    return [x for x in seq if not (x[1] in seen or seen_add(x[1]))]

请注意,如果您希望保留最后 出现,也显示了 OrderedDict 方法;对于第一次出现,您必须反转输入然后再次反转输出。

您可以通过支持 key 函数使其更加通用:

def unique_preserve_order(seq, key=None):
    if key is None:
        key = lambda elem: elem
    seen = set()
    seen_add = seen.add
    augmented = ((key(x), x) for x in seq)
    return [x for k, x in augmented if not (k in seen or seen_add(k))]

然后使用

import operator

unique_preserve_order(yourlist, key=operator.itemgetter(1))

演示:

>>> def unique_preserve_order(seq, key=None):
...     if key is None:
...         key = lambda elem: elem
...     seen = set()
...     seen_add = seen.add
...     augmented = ((key(x), x) for x in seq)
...     return [x for k, x in augmented if not (k in seen or seen_add(k))]
... 
>>> from pprint import pprint
>>> import operator
>>> yourlist = [(67, u'top-coldestcitiesinamerica'), (66, u'ecofriendlyideastocelebrateindependenceday-phpapp'), (65, u'a-b-c-ca-d-ab-ea-d-c-c'), (64, u'a-b-c-ca-d-ab-ea-d-c-c'), (63, u'alexandre-meybeck-faowhatisclimate-smartagriculture-backgroundopportunitiesandchallenges'), (62, u'ghgemissions'), (61, u'top-coldestcitiesinamerica'), (58, u'infographicthe-stateofdigitaltransformationaltimetergroup'), (57, u'culture'), (55, u'cas-k-ihaveanidea'), (54, u'trendsfor'), (53, u'batteryimpedance'), (52, u'evs-howey-full'), (51, u'bericht'), (49, u'classiccarinsurance'), (47, u'uploaded_file'), (46, u'x_file'), (45, u's-s-main'), (44, u'vehicle-propulsion'), (43, u'x_file')]
>>> pprint(unique_preserve_order(yourlist, operator.itemgetter(1)))
[(67, u'top-coldestcitiesinamerica'),
 (66, u'ecofriendlyideastocelebrateindependenceday-phpapp'),
 (65, u'a-b-c-ca-d-ab-ea-d-c-c'),
 (63,
  u'alexandre-meybeck-faowhatisclimate-smartagriculture-backgroundopportunitiesandchallenges'),
 (62, u'ghgemissions'),
 (58, u'infographicthe-stateofdigitaltransformationaltimetergroup'),
 (57, u'culture'),
 (55, u'cas-k-ihaveanidea'),
 (54, u'trendsfor'),
 (53, u'batteryimpedance'),
 (52, u'evs-howey-full'),
 (51, u'bericht'),
 (49, u'classiccarinsurance'),
 (47, u'uploaded_file'),
 (46, u'x_file'),
 (45, u's-s-main'),
 (44, u'vehicle-propulsion')]

关于python - 从元组列表中删除重复项,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/28833901/

相关文章:

python - 如何使用 itertools.groupby 格式化字典列表?

python - python 中的错误栏 :ErrorbarContainer object of 3 artists

Python IMAP 搜索部分主题

python - 什么时候应该使用 Popen() 什么时候应该使用 call()?

python - 将字符串连接到列表

python - 多索引切片(涉及时间序列/日期范围)不适用于 DataFrame 但适用于 Series

Python:如何找到一年中的第 n 个工作日?

python - Django:阻止访问一个应用程序中的特定用户

python - @jupyterlab/vega6-extension 在哪里?

python - 如何在 GPU 上有效并行化 AlphaZero?