python - 控制洗牌距离

我有tried to ask this question before ，但一直无法正确表达。我希望这次我做对了:

我有一个独特元素的列表。我想打乱这个列表以生成一个新列表。但是，我想限制随机播放，以便每个元素的新位置最多 d 远离其在列表中的原始位置。

例如:

L = [1,2,3,4]
d = 2
answer = magicFunction(L, d)

现在，一种可能的结果可能是:

>>> print(answer)
[3,1,2,4]

注意 3 移动了两个索引，1 和 2 移动了一个索引，4 移动了一个索引一点也没有动。因此，根据我之前的定义，这是一次有效的洗牌。以下代码片段可用于验证这一点:

old = {e:i for i,e in enumerate(L)}
new = {e:i for i,e in enumerate(answer)}
valid = all(abs(i-new[e])<=d for e,i in old.items())

现在，我可以轻松地生成 L 的所有可能排列，过滤出有效的排列，然后随机选择一个。但这似乎不是很优雅。有没有人对如何实现这一点有任何其他想法？

最佳答案

这将是漫长而干燥的。

我有一个产生均匀分布的解决方案。它需要O(len(L) * d**d)时间和空间进行预计算，然后在O(len(L)*d)时间¹。如果不需要均匀分布，则不需要预计算，由于更快的随机选择，洗牌时间可以减少到O(len(L))；我还没有实现非均匀分布。该算法的两个步骤都比蛮力快得多，但它们仍然不如我希望的那样好。此外，虽然这个概念应该可行，但我没有像我希望的那样彻底地测试我的实现。

假设我们从前面迭代L，为每个元素选择一个位置。将lag 定义为下一个要放置的元素与第一个未填充位置之间的距离。每次我们放置一个元素时，延迟最多增加一个，因为下一个元素的索引现在高了一个，但第一个未填充位置的索引不能变低。

只要延迟为 d，我们就被迫将下一个元素放在第一个未填充的位置，即使在 d 的距离内可能还有其他空位.如果我们这样做，延迟就不会超过 d，我们将始终有一个位置来放置每个元素，并且我们将生成一个有效的列表洗牌。因此，我们对如何生成随机播放有了一个大概的了解；但是，如果我们随机均匀地做出选择，则整体分布将不均匀。例如，对于 len(L) == 3 和 d == 1，有 3 种可能的洗牌(中间元素的每个位置一个)，但是如果我们统一选择第一个元素的位置，一个洗牌的可能性是其他任何一个的两倍。

如果我们想要对有效洗牌进行均匀分布，我们需要对每个元素的位置进行加权随机选择，如果我们选择该位置，则位置的权重基于可能的洗牌次数。如果天真地这样做，这将需要我们生成所有可能的洗牌来计算它们，这将花费 O(d**len(L)) 时间。然而，在算法的任何步骤之后剩余的可能洗牌数量仅取决于我们填充的哪些点，而不是它们填充的顺序。对于填充或未填充点的任何模式，数量可能的洗牌次数是下一个元素的每个可能位置的可能洗牌次数的总和。在任何一步，最多有 d 个可能的位置来放置下一个元素，并且有 O(d**d) 种可能的未填充点模式(因为任何点当前元素后面超过 d 的地方必须是满的，任何 d 或更前面的地方必须是空的)。我们可以使用它来生成大小为 O(len(L) * d**d) 的马尔可夫链，采用 O(len(L) * d**d) 这样做的时间，然后使用此马尔可夫链在 O(len(L)*d) 时间内执行洗牌。

示例代码(由于马尔可夫链表示效率低下，目前还不是O(len(L)*d)):

import random

# states are (k, filled_spots) tuples, where k is the index of the next
# element to place, and filled_spots is a tuple of booleans
# of length 2*d, representing whether each index from k-d to
# k+d-1 has an element in it. We pretend indices outside the array are
# full, for ease of representation.

def _successors(n, d, state):
    '''Yield all legal next filled_spots and the move that takes you there.

    Doesn't handle k=n.'''
    k, filled_spots = state
    next_k = k+1

    # If k+d is a valid index, this represents the empty spot there.
    possible_next_spot = (False,) if k + d < n else (True,)

    if not filled_spots[0]:
        # Must use that position.
        yield k-d, filled_spots[1:] + possible_next_spot
    else:
        # Can fill any empty spot within a distance d.
        shifted_filled_spots = list(filled_spots[1:] + possible_next_spot)
        for i, filled in enumerate(shifted_filled_spots):
            if not filled:
                successor_state = shifted_filled_spots[:]
                successor_state[i] = True
                yield next_k-d+i, tuple(successor_state)
                # next_k instead of k in that index computation, because
                # i is indexing relative to shifted_filled_spots instead
                # of filled_spots


def _markov_chain(n, d):
    '''Precompute a table of weights for generating shuffles.

    _markov_chain(n, d) produces a table that can be fed to
    _distance_limited_shuffle to permute lists of length n in such a way that
    no list element moves a distance of more than d from its initial spot,
    and all permutations satisfying this condition are equally likely.

    This is expensive.

    '''

    if d >= n - 1:
        # We don't need the table, and generating a table for d >= n
        # complicates the indexing a bit. It's too complicated already.
        return None

    table = {}
    termination_state = (n, (d*2 * (True,)))
    table[termination_state] = 1

    def possible_shuffles(state):
        try:
            return table[state]
        except KeyError:
            k, _ = state
            count = table[state] = sum(
                    possible_shuffles((k+1, next_filled_spots))
                    for (_, next_filled_spots) in _successors(n, d, state)
            )
            return count

    initial_state = (0, (d*(True,) + d*(False,)))
    possible_shuffles(initial_state)
    return table

def _distance_limited_shuffle(l, d, table):
    # Generate an index into the set of all permutations, then use the
    # markov chain to efficiently find which permutation we picked.

    n = len(l)

    if d >= n - 1:
        random.shuffle(l)
        return

    permutation = [None]*n
    state = (0, (d*(True,) + d*(False,)))
    permutations_to_skip = random.randrange(table[state])

    for i, item in enumerate(l):
        for placement_index, new_filled_spots in _successors(n, d, state):
            new_state = (i+1, new_filled_spots)
            if table[new_state] <= permutations_to_skip:
                permutations_to_skip -= table[new_state]
            else:
                state = new_state
                permutation[placement_index] = item
                break
    return permutation

class Shuffler(object):
    def __init__(self, n, d):
        self.n = n
        self.d = d
        self.table = _markov_chain(n, d)
    def shuffled(self, l):
        if len(l) != self.n:
            raise ValueError('Wrong input size')
        return _distance_limited_shuffle(l, self.d, self.table)
    __call__ = shuffled

¹我们可以使用基于树的加权随机选择算法将洗牌时间提高到 O(len(L)*log(d))，但是由于即使是中等大小的 d，该表也变得如此巨大，这似乎不值得。此外，边界中 d**d 的因子被高估了，但实际因子仍然至少是 d 的指数。

关于python - 控制洗牌距离，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/42981944/

python - 控制洗牌距离

上一篇：python - 在 Jupyter 中使用 plotly 在离线模式下创建动画图表

下一篇：python - 使用预训练的 Inception_v4 模型