python - 根据长度和交集从列表列表中选择元素

l1 = [['a', 'b', 'c'],
      ['a', 'd', 'c'],
      ['a', 'e'],
      ['a', 'd', 'c'],
      ['a', 'f', 'c'],
      ['a', 'e'],
      ['p', 'q', 'r']]

l2 = [1, 1, 1, 2, 0, 0, 0]

我有两个列表，如上所示。 l1 是一个列表列表，l2 是另一个具有某种分数的列表。

问题:对于 l1 中得分为 0 的所有列表(来自 l2)，找到那些完全不同的列表或者具有最短的长度。

例如:如果我有列表 [1, 2, 3]、[2, 3]、[5, 7] 所有得分均为 0，我将选择 [5, 7] 因为这些元素不存在于任何其他列表中，而 [2, 3] 因为它与[1, 2, 3] 但长度较小。

我现在如何做到这一点:

l = [x for x, y in zip(l1, l2) if y == 0]
lx = [(x, y) for x, y in zip(l1, l2) if y > 0]
c = list(itertools.combinations(l, 2))

un_usable = []
usable = []
for i, j in c:
    intersection = len(set(i).intersection(set(j)))
    if intersection > 0:
        if len(i) < len(j):
            usable.append(i)
            un_usable.append(j)
        else:
            usable.append(j)
            un_usable.append(i)

for i, j in c:
    intersection = len(set(i).intersection(set(j)))
    if intersection == 0:
        if i not in un_usable and i not in usable:
            usable.append(i)
        if j not in un_usable and j not in usable:
            usable.append(j)            

final = lx + [(x, 0) for x in usable]

最后给了我:

[(['a', 'b', 'c'], 1),
 (['a', 'd', 'c'], 1),
 (['a', 'e'], 1),
 (['a', 'd', 'c'], 2),
 (['a', 'e'], 0),
 (['p', 'q', 'r'], 0)]

这是所需的结果。

编辑:处理相等的长度:

l1 = [['a', 'b', 'c'],
      ['a', 'd', 'c'],
      ['a', 'e'],
      ['a', 'd', 'c'],
      ['a', 'f', 'c'],
      ['a', 'e'],
      ['p', 'q', 'r'],
      ['a', 'k']]

l2 = [1, 1, 1, 2, 0, 0, 0, 0]     

l = [x for x, y in zip(l1, l2) if y == 0]
lx = [(x, y) for x, y in zip(l1, l2) if y > 0]
c = list(itertools.combinations(l, 2))
un_usable = []
usable = []
for i, j in c:
    intersection = len(set(i).intersection(set(j)))
    if intersection > 0:
        if len(i) < len(j):
            usable.append(i)
            un_usable.append(j)
        elif len(i) == len(j):
            usable.append(i)
            usable.append(j)
        else:
            usable.append(j)
            un_usable.append(i)

usable = [list(x) for x in set(tuple(x) for x in usable)]
un_usable = [list(x) for x in set(tuple(x) for x in un_usable)]

for i, j in c:
    intersection = len(set(i).intersection(set(j)))
    if intersection == 0:
        if i not in un_usable and i not in usable:
            usable.append(i)
        if j not in un_usable and j not in usable:
            usable.append(j)            

final = lx + [(x, 0) for x in usable]

有没有更好、更快、Python 的方法来实现同样的目的？

最佳答案

假设我正确理解了所有内容，这是一个 O(N) 两遍算法。

步骤:

选择得分为零的列表。
对于每个零分列表的每个元素，找到该元素出现的最短零分列表的长度。我们将其称为元素的长度分数。
对于每个列表，找到列表中所有元素的长度分数的最小值。如果结果小于列表的长度，则丢弃该列表。

<小时/>

def select_lsts(lsts, scores):
    # pick out zero score lists
    z_lsts = [lst for lst, score in zip(lsts, scores) if score == 0]

    # keep track of the shortest length of any list in which an element occurs
    len_shortest = dict()
    for lst in z_lsts:
        ln = len(lst)
        for c in lst:
            len_shortest[c] = min(ln, len_shortest.get(c, float('inf')))

    # check if the list is of minimum length for each of its chars
    for lst in z_lsts:
        len_lst = len(lst)
        if any(len_shortest[c] < len_lst for c in lst):
            continue

        yield lst

关于python - 根据长度和交集从列表列表中选择元素，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/50618761/

python - 根据长度和交集从列表列表中选择元素

上一篇：python - 如何在 df.at 中通过一次迭代设置多个位置的值

下一篇：python - 来自 2 个文件的 CSV 值映射，例如 pandas 中的 map