python - 使用 pearsonr 时遇到无效值

也许我弄错了。如果是这样，我很抱歉问这个问题。

我想计算Pearson's correlation coefficent通过使用 scipy 的 pearsonr 函数。

from scipy.stats.stats import pearsonr

X = [4, 4, 4, 4, 4, 4]
Y = [4, 5, 5, 4, 4, 4]

pearsonr(X, Y)

下面出现错误

RuntimeWarning: invalid value encountered in double_scalars ###

我报错的原因是E[X] = 4 (Excepted Value of X is 4)

我在scpy.stats.stats.py中查看了pearsonr函数的代码。 pearsonr函数的部分内容如下。

mx = x.mean() # which is 4
my = y.mean() # not necessary
xm, ym = x-mx, y-my # xm = [0 0 0 0 0 0]
r_num = n*(np.add.reduce(xm*ym)) #r_num = 0, because xm*ym 1x6 Zero Vector.
r_den = n*np.sqrt(ss(xm)*ss(ym)) #r_den = 0
r = (r_num / r_den) # Invalid value encountered in double_scalars

最后，pearsonr 返回 (nan, 1.0)

pearsonr 应该返回 (0, 1.0) 吗？

我认为如果向量的每一行/列都具有相同的值，则协方差应该为零。因此根据 PCC 的定义， PIL 逊相关系数也应该为零。

Pearson's correlation coefficient between two variables is defined as the covariance of the two variables divided by the product of their standard deviations.

是bug还是哪里出错了？

最佳答案

Pearson's correlation coefficient between two variables is defined as the covariance of the two variables divided by the product of their standard deviations.

所以这是协方差

[4, 5, 5, 4, 4, 4] 次的标准差
[4, 4, 4, 4, 4, 4] 的标准差。

[4, 4, 4, 4, 4, 4] 的标准差为零。

所以这是协方差

[4, 5, 5, 4, 4, 4] 次的标准差
零。

所以这是协方差

零。

任何除以零的值都是nan。协方差的值无关。

关于python - 使用 pearsonr 时遇到无效值，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/7653993/

python - 使用 pearsonr 时遇到无效值

上一篇：python - 如何在 Python 中找到字符串中的一个数字？

下一篇：python - 我无法理解 python 中的轮询/选择