以不同概率选择列表元素的 Pythonic 方法

import random
pos = ["A", "B", "C"]
x = random.choice["A", "B", "C"]

此代码以相等的概率给我“A”、“B”或“C”。当你想要“A”有 30%、“B”有 40%、“C”有 30% 的概率时，有没有一种很好的表达方式？

最佳答案

权重定义了一个概率分布函数 (pdf)。 applying its associated inverse cumulative distribution function 可以从任何此类 pdf 中生成随机数在 0 到 1 之间统一随机数。

另见SO explanation , 或者，正如 Wikipedia 所解释的那样:

If Y has a U[0,1] distribution then F⁻¹(Y) is distributed as F. This is used in random number generation using the inverse transform sampling-method.

import random
import bisect
import collections

def cdf(weights):
    total = sum(weights)
    result = []
    cumsum = 0
    for w in weights:
        cumsum += w
        result.append(cumsum / total)
    return result

def choice(population, weights):
    assert len(population) == len(weights)
    cdf_vals = cdf(weights)
    x = random.random()
    idx = bisect.bisect(cdf_vals, x)
    return population[idx]

weights=[0.3, 0.4, 0.3]
population = 'ABC'
counts = collections.defaultdict(int)
for i in range(10000):
    counts[choice(population, weights)] += 1
print(counts)

# % test.py
# defaultdict(<type 'int'>, {'A': 3066, 'C': 2964, 'B': 3970})

上面的choice函数使用bisect.bisect，所以加权随机变量的选择是在O(log n)中完成的，其中n 是 weights 的长度。

请注意，从 1.7.0 版开始，NumPy 有一个 Cythonized np.random.choice function .例如，这会从总体 [0,1,2,3] 中生成 1000 个样本，权重为 [0.1, 0.2, 0.3, 0.4]:

import numpy as np
np.random.choice(4, 1000, p=[0.1, 0.2, 0.3, 0.4])

np.random.choice 也有一个 replace 参数，用于带或不带替换的采样。

理论上更好的算法是Alias Method .它建立一个需要 O(n) 时间的表格，但之后，可以在 O(1) 时间内绘制样本。所以，如果你需要抽取很多样本，理论上 Alias Method 可能会更快。 Walker Alias Method here 有一个 Python 实现。 , 和 numpy version here .

关于以不同概率选择列表元素的 Pythonic 方法，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/4113307/

以不同概率选择列表元素的 Pythonic 方法

上一篇：python - Groupby 值对数据框 pandas 的计数

下一篇：python - 如何使用可选参数构建装饰器？