python - 将所有分区迭代到 k 组?

标签 python data-partitioning

假设我有一个列表 L。我怎样才能得到一个遍历 K 组所有分区的迭代器?

示例:L = [ 2,3,5,7,11, 13], K = 3

3组所有可能分区的列表:

[ [ 2 ], [ 3, 5], [ 7,11,13] ]
[ [ 2,3,5 ], [ 7, 11], [ 13] ]
[ [ 3, 11 ], [ 5, 7], [ 2, 13] ]
[ [ 3 ], [ 11 ], [ 5, 7, 2, 13] ]
etc...

===更新===

我正在研究一个似乎有效的解决方案,所以我将复制粘贴它

# -*- coding: utf-8 -*-

import itertools 

# return ( list1 - list0 )
def l1_sub_l0( l1, l0 ) :
    """Substract two lists"""
    #
    copy_l1 = list( l1 )
    copy_l0 = list( l0 )

    #
    for xx in l0 :
        #
        if copy_l1.count( xx ) > 0 :
            #
            copy_l1.remove( xx )
            copy_l0.remove( xx )

    #
    return [ copy_l1, copy_l0 ]


#
def gen_group_len( n, k ) :
    """Generate all possible group sizes"""

    # avoid doubles
    stop_list = []
    #
    for t in itertools.combinations_with_replacement( xrange( 1, n - 1 ), k - 1 ) :
        #
        last_n = n - sum( t )

        # valid group size
        if last_n  >= 1 :
            res = tuple( sorted( t + ( last_n, ) ) )
            #
            if res not in stop_list :
                yield res
                stop_list.append( res )


# group_len = (1, 1, 3)

def gen( group_len, my_list ) :
    """Generate all possible partitions of all possible group sizes"""

    #
    if len( group_len ) == 1 :
        yield ( tuple( my_list ), )

    #
    else :

        # need for a stop list if 2 groups of same size
        stop_list = []

        #
        for t in itertools.combinations( my_list, group_len[ 0 ] ) :
            #
            reduced_list = l1_sub_l0( my_list, t )[ 0 ]

            #
            for t2 in gen( group_len[ 1: ], reduced_list ) :
                #
                tmp = set( ( t, t2[ 0 ] ) )
                #
                if tmp not in stop_list :
                    yield ( t, ) + t2
                    # avoid doing same thing twice
                    if group_len[ 1 ] == group_len[ 0 ] :
                        stop_list.append( tmp )


#
my_list = [ 3,5,7,11,13 ]
n = len( my_list )
k = 3

#
group_len_list = list( gen_group_len( n, k ) )
print "for %i elements, %i configurations of group sizes" % ( n, len(  group_len_list ) )
print group_len_list

#
for group_len in group_len_list :
    #
    print "group sizes", group_len
    #
    for x in gen( group_len, my_list ) :
        print x
    #
    print "==="

输出:

for 5 elements, 2 configurations of group sizes
[(1, 1, 3), (1, 2, 2)]
group sizes (1, 1, 3)
((3,), (5,), (7, 11, 13))
((3,), (7,), (5, 11, 13))
((3,), (11,), (5, 7, 13))
((3,), (13,), (5, 7, 11))
((5,), (7,), (3, 11, 13))
((5,), (11,), (3, 7, 13))
((5,), (13,), (3, 7, 11))
((7,), (11,), (3, 5, 13))
((7,), (13,), (3, 5, 11))
((11,), (13,), (3, 5, 7))
===
group sizes (1, 2, 2)
((3,), (5, 7), (11, 13))
((3,), (5, 11), (7, 13))
((3,), (5, 13), (7, 11))
((5,), (3, 7), (11, 13))
((5,), (3, 11), (7, 13))
((5,), (3, 13), (7, 11))
((7,), (3, 5), (11, 13))
((7,), (3, 11), (5, 13))
((7,), (3, 13), (5, 11))
((11,), (3, 5), (7, 13))
((11,), (3, 7), (5, 13))
((11,), (3, 13), (5, 7))
((13,), (3, 5), (7, 11))
((13,), (3, 7), (5, 11))
((13,), (3, 11), (5, 7))
===

最佳答案

这是有效的,尽管它可能非常低效(我将它们全部排序以避免重复计算):

def clusters(l, K):
    if l:
        prev = None
        for t in clusters(l[1:], K):
            tup = sorted(t)
            if tup != prev:
                prev = tup
                for i in xrange(K):
                    yield tup[:i] + [[l[0]] + tup[i],] + tup[i+1:]
    else:
        yield [[] for _ in xrange(K)]

它还会返回空簇,因此您可能希望包装它以便仅获得非空簇:

def neclusters(l, K):
    for c in clusters(l, K):
        if all(x for x in c): yield c

计数只是为了检查:

def kamongn(n, k):
    res = 1
    for x in xrange(n-k, n):
        res *= x + 1
    for x in xrange(k):
        res /= x + 1
    return res

def Stirling(n, k):
    res = 0
    for j in xrange(k + 1):
        res += (-1)**(k-j) * kamongn(k, j) * j ** n
    for x in xrange(k):
        res /= x + 1
    return res

>>> sum(1 for _ in neclusters([2,3,5,7,11,13], K=3)) == Stirling(len([2,3,5,7,11,13]), k=3)
True

有效!

输出:

>>> clust = neclusters([2,3,5,7,11,13], K=3)
>>> [clust.next() for _ in xrange(5)]
[[[2, 3, 5, 7], [11], [13]], [[3, 5, 7], [2, 11], [13]], [[3, 5, 7], [11], [2, 13]], [[2, 3, 11], [5, 7], [13]], [[3, 11], [2, 5, 7], [13]]]

关于python - 将所有分区迭代到 k 组?,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/18353280/

相关文章:

python - 在 Django 中,将 urls.py 重定向到的 Python 文件放在哪里?

python - 如何在 Python 中读取 Excel 格式的日期?

r - 从一组集合中找到所有不相交(非重叠)的集合

python - 如何在 python 中 pickle 一个动态创建的嵌套类?

Python 正则表达式搜索和替换

MySQL 分区 : Performance increase For multiple partitioned tables. 为什么?

java - 将 float 组划分为相似的段(聚类)

json - 使用 jq 删除嵌套数组的匹配/不匹配元素

python - tweepy OAuthHandler 出错

集(图)分区对之间转换次数的算法