python - 如何一次计算所有每个 numpy 值的概率?

标签 python python-3.x probability

我有一个计算概率的函数,如下所示:

def multinormpdf(x, mu, var): # calculate probability of multi Gaussian distribution
    k = len(x)
    det = np.linalg.det(var)
    inv = np.linalg.inv(var)
    denominator = math.sqrt(((2*math.pi)**k)*det)
    numerator = np.dot((x - mean).transpose(), inv)
    numerator = np.dot(numerator, (x - mean))
    numerator = math.exp(-0.5 * numerator)
    return numerator/denominator

我有均值向量、协方差矩阵和二维 numpy 数组用于测试

mu = np.array([100, 105, 42]) # mean vector
var = np.array([[100, 124, 11], # covariance matrix
               [124, 150, 44],
               [11, 44, 130]])

arr = np.array([[42, 234, 124],  # arr is 43923794 x 3 matrix
                [123, 222, 112],
                [42, 213, 11],
                ...(so many values about 40,000,000 rows),
                [23, 55, 251]])

我必须计算每个值的概率,所以我使用了这段代码

for i in arr:
    print(multinormpdf(i, mu, var)) # I already know mean_vector and variance_matrix

但是太慢了...

有没有更快的方法来计算概率? 或者有什么方法可以像“batch”一样一次性计算测试 arr 的概率?

最佳答案

您可以轻松矢量化您的函数:

import numpy as np

def fast_multinormpdf(x, mu, var):
    mu = np.asarray(mu)
    var = np.asarray(var)
    k = x.shape[-1]
    det = np.linalg.det(var)
    inv = np.linalg.inv(var)
    denominator = np.sqrt(((2*np.pi)**k)*det)
    numerator = np.dot((x - mu), inv)
    numerator = np.sum((x - mu) * numerator, axis=-1)
    numerator = np.exp(-0.5 * numerator)
    return numerator/denominator


arr = np.array([[42, 234, 124],
                [123, 222, 112],
                [42, 213, 11],
                [42, 213, 11]])

mu = [0, 0, 1]
var = [[1, 100, 100],
       [100, 1, 100],
       [100, 100, 1]]

slow_out = np.array([multinormpdf(i, mu, var) for i in arr])
fast_out = fast_multinormpdf(arr, mu, var)

np.allclose(slow_out, fast_out) # True

fast_multinormpdf 比非矢量化函数快约 1000 倍:

long_arr = np.tile(arr, (10000, 1))

%timeit np.array([multinormpdf(i, mu, var) for i in long_arr])
# 2.12 s ± 93.4 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
%timeit fast_multinormpdf(long_arr, mu, var)
# 2.56 ms ± 76.9 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

关于python - 如何一次计算所有每个 numpy 值的概率?,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/53416511/

相关文章:

python - 返回带参数的函数

python - 如何模拟在 __init__ 中实例化的类属性?

python:使用故障处理程序有缺点吗?

Python Selenium 选择 : "Element <option> could not be scrolled into view"

julia - 模拟一道概率题: 3 independent dice

python - Django urls 调度程序错误 (urls.E004) 确保 urlpatterns 是 url() 实例的列表

python - matplotlib 文本未被剪裁

python - 如何获取可调用对象的签名参数,或可靠地确定何时这是不可能的?

algorithm - 朴素洗牌的现实问题

r - 柯尔莫哥洛夫-斯米尔诺夫检验