python - 循环遍历并行列表删除匹配项，直到不再存在匹配项

我有 3 个并行列表，代表一个 3 元组(日期、描述、金额)，还有 3 个新列表，我需要合并这些列表而不创建重复的条目。是的，列表有重叠的条目，但是这些重复的条目不会组合在一起(而不是所有重复项都是 0 到 x，所有新条目都是 x 到末尾)。

我遇到的问题是迭代正确的次数以确保捕获所有重复项。相反，我的代码会继续运行，并保留重复的代码。

for x in dates:
    MoveNext = 'false'
    while MoveNext == 'false':
        Reiterate = 'false'
        for a, b in enumerate(descriptions):
            if Reiterate == 'true':
                break
            if b in edescriptions:
                eindex = [c for c, d in enumerate(edescriptions) if d == b]
                for e, f in enumerate(eindex):
                    if Reiterate == 'true':
                        break
                    if edates[f] == dates[a]:
                        if eamounts[f] == amounts[a]:
                            del dates[a]
                            del edates[f]
                            del descriptions[a]
                            del edescriptions[f]
                            del amounts[a]
                            del eamounts[f]
                            Reiterate = 'true'
                            break
                        else:
                            MoveNext = 'true'
                    else:
                        MoveNext = 'true'
            else:
                MoveNext = 'true'

我不知道这是否是巧合，但我目前正好删除了一半的新项目，而另一半仍然保留。事实上，应该远远少于剩下的。这让我认为 for x in days: 没有迭代正确的次数。

最佳答案

我建议采用不同的方法:不要尝试从列表(或更糟糕的是，几个并行列表)中删除项目，而是运行输入并仅生成通过测试的数据 ---在本例中，是您以前从未见过的数据。使用单个输入流就容易得多。

您的数据列表迫切需要制成对象，因为如果没有其他两个数据，每一个数据(例如日期)都毫无意义......至少对于您当前的目的而言。下面，我首先将每个三元组组合到一个 Record 实例中，即 collections.namedtuple 。它们非常适合这种一次性使用的工作。

在下面的程序中，build_records 从三个输入列表创建 Record 对象。 dedup_records 合并多个 Record 对象流，使用 unique 过滤掉重复项。保持每个函数较小(大部分 main 函数是测试数据)使得每个步骤都易于测试。

#!/usr/bin/env python3

import collections
import itertools


Record = collections.namedtuple('Record', ['date', 'description', 'amount'])


def unique(records):
    '''
    Yields only the unique Records in the given iterable of Records.
    '''
    seen = set()
    for record in records:
        if record not in seen:
            seen.add(record)
            yield record
    return


def dedup_records(*record_iterables):
    '''
    Yields unique Records from multiple iterables of Records, preserving the
    order of first appearance.
    '''
    all_records = itertools.chain(*record_iterables)
    yield from unique(all_records)
    return


def build_records(dates, descriptions, amounts):
    '''
    Yields Record objects built from each date-description-amount triplet.
    '''
    for args in zip(dates, descriptions, amounts):
        yield Record(*args)
    return


def main():
    # Sample data
    dates_old = [
      '2000-01-01',
      '2001-01-01',
      '2002-01-01',
      '2003-01-01',
      '2000-01-01',
      '2001-01-01',
      '2002-01-01',
      '2003-01-01',
      ]
    dates_new = [
      '2000-01-01',
      '2001-01-01',
      '2002-01-01',
      '2003-01-01',
      '2003-01-01',
      '2002-01-01',
      '2001-01-01',
      '2000-01-01',
      ]
    descriptions_old = ['a', 'b', 'c', 'd', 'a', 'b', 'c', 'd']
    descriptions_new = ['b', 'b', 'c', 'a', 'a', 'c', 'd', 'd']
    amounts_old = [0, 1, 0, 1, 0, 1, 0, 1]
    amounts_new = [0, 0, 0, 0, 1, 1, 1, 1]
    old = [dates_old, descriptions_old, amounts_old]
    new = [dates_new, descriptions_new, amounts_new]

    for record in dedup_records(build_records(*old), build_records(*new)):
        print(record)
    return


if '__main__' == __name__:
    main()

这将 16 个输入记录减少到 11 个:

Record(date='2000-01-01', description='a', amount=0)
Record(date='2001-01-01', description='b', amount=1)
Record(date='2002-01-01', description='c', amount=0)
Record(date='2003-01-01', description='d', amount=1)
Record(date='2000-01-01', description='b', amount=0)
Record(date='2001-01-01', description='b', amount=0)
Record(date='2003-01-01', description='a', amount=0)
Record(date='2003-01-01', description='a', amount=1)
Record(date='2002-01-01', description='c', amount=1)
Record(date='2001-01-01', description='d', amount=1)
Record(date='2000-01-01', description='d', amount=1)

请注意，yield from ... 语法需要 Python 3.3 或更高版本。

关于python - 循环遍历并行列表删除匹配项，直到不再存在匹配项，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/39049905/

python - 循环遍历并行列表删除匹配项，直到不再存在匹配项

上一篇：python - 对 Python Pandas Dataframe 中的行重新排序

下一篇：python - 使用Python编辑列表中的时间戳？使用函数将 POSIX 转换为可读格式