我有数据，其中我有一个变量 z，它包含大约 4000 个值(从 0.0 到 1.0)，其直方图如下所示。

enter image description here

现在我需要生成一个随机变量，将其命名为 random_z，它应该复制上述分布。

到目前为止，我所尝试的是生成一个以 1.0 为中心的正态分布，这样我就可以删除所有高于 1.0 的正态分布以获得相似的分布。我一直在使用 numpy.random.normal 但问题是我无法将范围设置为 0.0 到 1.0，因为通常正态分布的均值 = 0.0 和标准偏差 = 1.0。

有没有另一种方法可以在 Python 中生成这个分布？

最佳答案

如果你想引导你可以在你观察到的系列上使用 random.choice()。

在这里，我假设您想要比这更平滑一点，并且您不关心生成新的极值。

使用 pandas.Series.quantile() 和统一的 [0,1] 随机数生成器，如下所示。

训练

将你的随机样本放入一个 pandas 系列中，称这个系列为 S

制作

以通常的方式生成一个介于 0.0 和 1.0 之间的随机数 u，例如， random.random()
返回 S.quantile(u)

如果您更愿意使用 numpy 而不是 pandas，从快速阅读来看，您似乎可以在步骤 2 中替换 numpy.percentile()。

工作原理:

从样本 S 中，pandas.series.quantile() 或 numpy.percentile() 用于计算 Inverse transform sampling 方法的逆累积分布函数。分位数或百分位数函数(相对于S)将一个均匀的[0,1]伪随机数变换为具有样本S的范围和分布的伪随机数。

简单示例代码

如果您需要最大限度地减少编码并且不想编写和使用仅返回单个实现的函数，那么 numpy.percentile 似乎最好 pandas.Series.quantile.

令 S 为预先存在的样本。

u将是新的均匀随机数

newR 将是从类似 S 的分布中抽取的新随机数。

>>> import numpy as np

我需要一个要复制的随机数样本以放入 S。

为了创建示例，我将对一些统一的 [0,1] 随机数求三次方并将其称为示例 S。通过选择以这种方式生成示例样本，我将提前知道——从均值等于从 0 到 1 的 (x^3)(dx) 的定积分计算——S 的均值应该是1/(3+1) = 1/4 = 0.25

在您的应用程序中，您可能需要做一些其他的事情，比如读取一个文件，以创建一个包含要复制其分布的数据样本的 numpy 数组 S。

>>> S = pow(np.random.random(1000),3)  # S will be 1000 samples of a power distribution

在这里，我将检查 S 的平均值是否如上所述为 0.25。

>>> S.mean()
0.25296623781420458 # OK

获取最小值和最大值只是为了展示 np.percentile 是如何工作的

>>> S.min()
6.1091277680105382e-10
>>> S.max()
0.99608676594692624

numpy.percentile 函数将 0-100 映射到 S 的范围。

>>> np.percentile(S,0)  # this should match the min of S
6.1091277680105382e-10 # and it does

>>> np.percentile(S,100) # this should match the max of S
0.99608676594692624 # and it does

>>> np.percentile(S,[0,100])  # this should send back an array with both min, max
[6.1091277680105382e-10, 0.99608676594692624]  # and it does

>>> np.percentile(S,np.array([0,100])) # but this doesn't.... 
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/usr/lib/python2.7/dist-packages/numpy/lib/function_base.py", line 2803, in percentile
    if q == 0:
ValueError: The truth value of an array with more than one element is ambiguous. Use a.any() or a.all()

如果我们从制服开始生成 100 个新值，这就不太好了:

>>> u = np.random.random(100)

因为会报错，而u的scale是0-1，需要0-100。

这会起作用:

>>> newR = np.percentile(S, (100*u).tolist())

它工作正常，但如果你想要一个 numpy 数组，可能需要调整它的类型

>>> type(newR)
<type 'list'>

>>> newR = np.array(newR)

现在我们有了一个 numpy 数组。让我们检查新随机值的平均值。

>>> newR.mean()
0.25549728059744525 # close enough

关于python - 生成复制任意分布的随机数，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/23626009/

python - 生成复制任意分布的随机数

简单示例代码

上一篇：python - 来自 pip 的 "Could not find .egg-info directory in install record"是什么意思？

下一篇：python - 如何在 alembic 中使用 alter_column？