python - 使用 Factoradic 系统允许重复时查找第 K 个字典排列

是否可以在允许重复的情况下使用 Factoradic Base System 找到第 k 个排列？
为了不重复地找到第 K 个排列，我可以在 python 中做这样的事情:

def factorial(n):
    if n == 0: return 1
    return n*factorial(n-1)

def unrank(S, k, i):
    S = list(S)   # make a copy to avoid destroying the list
    n = len(S)
    nb = factorial(n) // factorial(n-k)
    print (nb)
    if i >= nb:
        raise IndexError
    res = []
    while k > 0:
        nb = nb // n
        pos = i // nb   # the factoradic digits
        i = i % nb      # the remaining digits
        res.append(S[pos])
        del S[pos]
        k = k-1
        n = n-1
    return res

res = unrank(list('ABCDEFGHJKLMNPQRSTUVWXYZ0123456789'),3, 2222)
print (res)

查看原文 post

最佳答案

简短回答:不，我没有看到使用 Factoradic Base System 来做你想做的事情的方法，但有一种更简单的方法可以做到这一点。只需使用类似于通常的数字基数的东西即可。

您的术语令人困惑，因为您写的是“排列”但允许重复。让我们称它们为序列，其中为函数提供了一个要检查的测试序列和包含可以使用的字符的基本序列。您想使用基本序列中的字符在相同长度的所有可能序列的字典顺序列表中找到测试序列的计数。

为方便起见，我们假设基本序列按递增顺序排列且没有重复，如您的示例代码中所示。

对于序列中的每个字符，我们想知道它在基本序列中出现的位置。如果碱基序列和序列都很长，那么执行此操作的简单方法可能会很耗时，尤其是对长度的乘积进行排序。有一种方法可以通过对长度求和进行排序:首先对基本序列进行预处理，得到一个字典，将每个字符映射到其在基本序列中的位置，然后将我们测试序列中的每个字符转换到其在碱基序列。我们现在有一个基本序列中字符位置的列表。

这个列表就像一个以 N 为底的数字，其中 N 是基本序列的长度。然后我们使用通常的方法将其转换为标准整数，这是我们想要的结果。

这里有一些代码可以完成这一切。当然，还有其他方法可以做到这一点。

def sequence_position(test_seq, base_seq):
    """Return the count of the test sequence in the lexicographical
    listing of all possible sequences of the same length using the 
    items in the base sequence. Repetition of items is allowed and the 
    order of the items in the list matters.

    This function assumes the base sequence is in increasing order and
    has no repetitions.
    """
    # Create a dictionary mapping items in the base sequence to
    #   their positions in the base sequence.
    item_pos_dict = {item:pos for pos,item in enumerate(base_seq)}
    # Create a list of positions of the characters in the test sequence.
    positions = [item_pos_dict[item] for item in test_seq]
    # Convert this list of positions to its count in the lexicographical
    #   sequence of all such sequences of this length
    base = len(base_seq)
    result = 0
    for pos in positions:
        result = result * base + pos
    return result

print(sequence_position('ABC', 'ABCDEFGHIJKLMNOPQRSTUVWXYZ0123456789'))

关于python - 使用 Factoradic 系统允许重复时查找第 K 个字典排列，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/47740459/

python - 使用 Factoradic 系统允许重复时查找第 K 个字典排列

上一篇：algorithm - LP 可行域

下一篇：algorithm - 创建平衡二叉搜索树的时间复杂度？