python - 将列表列表替换为 "condensed"列表列表,同时保持顺序

标签 python algorithm python-2.x nested-lists itertools

我有一个列表,如我附加的代码中所示。如果有任何共同值,我想链接每个子列表。然后我想用一个精简的列表列表替换列表列表。 示例:如果我有一个列表 [[1,2,3],[3,4]] 我想要 [1,2,3,4] 。如果我有 [[4,3],[1,2,3]] 我想要 [4,3,1,2]。如果我有 [[1,2,3],[a,b],[3,4],[b,c]] 我想要 [[1,2,3, 4],[a,b,c]][[a,b,c],[1,2,3,4]] 我不在乎哪个。

我快到了……

我的问题是当我遇到像 [[1,2,3],[10,5],[3,8,5]] 这样的案例时,我想要 [1, 2,3,10,5,8] 但使用我当前的代码,我得到 [1,2,3,8,10,5]

这是我的代码:

import itertools

a = [1,2,3]
b = [3,4]
i = [21,22]
c = [88,7,8]
e = [5,4]
d = [3, 50]
f = [8,9]
g=  [9,10]
h = [20,21]

lst = [a,b,c,i,e,d,f,g,h,a,c,i]*1000  
#I have a lot of list but not very many different lists

def any_overlap(a, b):
  sb = set(b)
  return any(itertools.imap(sb.__contains__, a))

def find_uniq(lst):
    ''' return the uniq parts of lst'''
    seen = set()
    seen_add = seen.add
    return [ x for x in lst if x not in seen and not seen_add(x)]

def overlap_inlist(o_lst, lstoflst):
    '''
    Search for overlap, using "any_overlap", of a list( o_lst) in a list of lists (lstoflst).
    If there is overlap add the uniq part of the found list to the search list, and keep 
    track of where that list was found 
    '''
    used_lst =[ ]
    n_lst =[ ]
    for lst_num, each_lst in enumerate(lstoflst):
        if any_overlap(o_lst,each_lst):
            n_lst.extend(each_lst)
            used_lst.append(lst_num)
    n_lst= find_uniq(n_lst)
    return  n_lst, used_lst

def comb_list(lst):
    '''
    For each list in a list of list find all the overlaps using 'ovelap_inlist'.
    Update the list each time to delete the found lists. Return the final combined list. 
    '''
    for updated_lst in lst:
        n_lst, used_lst = overlap_inlist(updated_lst,lst)
        lst[:] = [x for i,x in enumerate(lst) if i not in used_lst]
        lst.insert(0,n_lst)
    return lst
comb_lst = comb_list(lst)
print comb_lst

这个脚本的输出是:

[[88, 7, 8, 9, 10], [1, 2, 3, 4, 50, 5], [21, 22, 20]]

我想要它,所以 key 按原始顺序排列,例如:

[[88, 7, 8, 9, 10], [1, 2, 3, 4, 5, 50,], [21, 22, 20]]

5和50切换在新的lst[2]

我对 python 有点陌生。我将不胜感激任何问题的解决方案或对我当前代码的评论。我不是计算机科学家,我想可能有某种算法可以快速做到这一点(也许来自集合论?)。如果有这样的算法,请指出我正确的方向。

我可能会让这种方式变得更复杂,然后...... 谢谢!!

最佳答案

这是一种蛮力方法(可能更容易理解):

from itertools import chain

def condense(*lists):
    # remember original positions
    positions = {}
    for pos, item in enumerate(chain(*lists)):
        if item not in positions:
            positions[item] = pos

    # condense disregarding order
    sets = condense_sets(map(set, lists))

    # restore order
    result = [sorted(s, key=positions.get) for s in sets]
    return result if len(result) != 1 else result[0]

def condense_sets(sets):
    result = []
    for candidate in sets:
        for current in result:
            if candidate & current:   # found overlap
                current |= candidate  # combine (merge sets)

                # new items from candidate may create an overlap
                # between current set and the remaining result sets
                result = condense_sets(result) # merge such sets
                break
        else:  # no common elements found (or result is empty)
            result.append(candidate)
    return result

示例

>>> condense([1,2,3], [10,5], [3,8,5])
[1, 2, 3, 10, 5, 8]
>>> a = [1,2,3]
>>> b = [3,4]
>>> i = [21,22]
>>> c = [88,7,8]
>>> e = [5,4]
>>> d = [3, 50]
>>> f = [8,9]
>>> g=  [9,10]
>>> h = [20,21]
>>> condense(*[a,b,c,i,e,d,f,g,h,a,c,i]*1000)
[[1, 2, 3, 4, 5, 50], [88, 7, 8, 9, 10], [21, 22, 20]]
>>> condense([1], [2, 3, 2])
[[1], [2, 3]]

性能对比

condense_*() 函数来自这个问题的答案。 lst_OP 来自问题的输入列表(不同大小),lst_BK - 来自 @Blckknght's answer 的测试列表(不同大小)。见 the source .

测量表明,基于“不相交集”和“无向图的连通分量”概念的解决方案在两种类型的输入上表现相似。

name                       time ratio comment
condense_hynekcer     5.79 msec  1.00 lst_OP
condense_hynekcer2     7.4 msec  1.28 lst_OP
condense_pillmuncher2 11.5 msec  1.99 lst_OP
condense_blckknght    16.8 msec  2.91 lst_OP
condense_jfs            26 msec  4.49 lst_OP
condense_BK           30.5 msec  5.28 lst_OP
condense_blckknght2   30.9 msec  5.34 lst_OP
condense_blckknght3   32.5 msec  5.62 lst_OP


name                       time  ratio comment
condense_blckknght     964 usec   1.00 lst_BK
condense_blckknght2   1.41 msec   1.47 lst_BK
condense_blckknght3   1.46 msec   1.51 lst_BK
condense_hynekcer2    1.95 msec   2.03 lst_BK
condense_pillmuncher2  2.1 msec   2.18 lst_BK
condense_hynekcer     2.12 msec   2.20 lst_BK
condense_BK           3.39 msec   3.52 lst_BK
condense_jfs           544 msec 563.66 lst_BK


name                       time ratio comment
condense_hynekcer     6.86 msec  1.00 lst_rnd
condense_jfs          16.8 msec  2.44 lst_rnd
condense_blckknght    28.6 msec  4.16 lst_rnd
condense_blckknght2   56.1 msec  8.18 lst_rnd
condense_blckknght3   56.3 msec  8.21 lst_rnd
condense_BK           70.2 msec 10.23 lst_rnd
condense_pillmuncher2  324 msec 47.22 lst_rnd
condense_hynekcer2     334 msec 48.61 lst_rnd

重现结果clone gist并运行 measure-performance-condense-lists.py

关于python - 将列表列表替换为 "condensed"列表列表,同时保持顺序,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/13714755/

相关文章:

python - 如何在Python中使用groupby删除列表中的重复项?

algorithm - 使用接受任意数量参数的 get/set 函数创建树

algorithm - Strassen 的算法效率帮助

algorithm - 具有交叉依赖性的最大流量减少

python - 如何增加列表中每个项目/元素的值?

python - 如何将日期时间插入Mysql?

python - 使用正则表达式从字符串中提取时间

python - Spark 作业在 rdd takeSample 上无限期挂起

python-3.x - 找不到Python2命令

python - 在 Python 2.x 中转换 u'\xe 0' to '\u00E0'?