我经常使用 R,但我是 Python 的新手。 在 R 中,计算给定矩阵的均值、cov 和 cor 的演示 给出如下:
X = matrix(c(1,0.5,3,7,9,6,2,8,4), nrow=3, ncol=3, byrow=FALSE)
X
# [,1] [,2] [,3]
# [1,] 1.0 7 2
# [2,] 0.5 9 8
# [3,] 3.0 6 4
M = colMeans(X) # apply(X,2,mean)
M
# [1] 1.500000 7.333333 4.666667
S = cov(X)
S
# [,1] [,2] [,3]
# [1,] 1.75 -1.750000 -1.500000
# [2,] -1.75 2.333333 3.666667
# [3,] -1.50 3.666667 9.333333
R = cor(X)
R
# [,1] [,2] [,3]
# [1,] 1.0000000 -0.8660254 -0.3711537
# [2,] -0.8660254 1.0000000 0.7857143
# [3,] -0.3711537 0.7857143 1.0000000
我想用 Python 重现上面的内容,我尝试:
import numpy as np
X = np.array([1,0.5,3,7,9,6,2,8,4]).reshape(3, 3)
X = np.transpose(X) # byrow=FALSE
X
# array([[ 1. , 7. , 2. ],
# [ 0.5, 9. , 8. ],
# [ 3. , 6. , 4. ]])
M = X.mean(axis=0) # colMeans
M
# array([ 1.5 , 7.33333333, 4.66666667])
S = np.cov(X)
S
# array([[ 10.33333333, 10.58333333, 4.83333333],
# [ 10.58333333, 21.58333333, 5.83333333],
# [ 4.83333333, 5.83333333, 2.33333333]])
R = np.corrcoef(X)
R
# array([[ 1. , 0.70866828, 0.98432414],
# [ 0.70866828, 1. , 0.82199494],
# [ 0.98432414, 0.82199494, 1. ]])
那么cov和cor的结果就不一样了。为什么?
最佳答案
这是因为 numpy
按行计算,R
按列计算。注释掉 X = np.transpose(X) # byrow=FALSE
,或使用 np.cov(X, rowvar=False)
。
np.cov(X, rowvar=False)
array([[ 1.75 , -1.75 , -1.5 ],
[-1.75 , 2.33333333, 3.66666667],
[-1.5 , 3.66666667, 9.33333333]])
不同之处在各自的文档中有解释(重点是我的):
python :
help(np.cov)
rowvar : bool, optional If
rowvar
is True (default), then each row represents a variable, with observations in the columns. Otherwise, the relationship is transposed: each column represents a variable, while the rows contain observations.
回复:
?cov
var, cov and cor compute the variance of x and the covariance or correlation of x and y if these are vectors. If x and y are matrices then the covariances (or correlations) between the columns of x and the columns of y are computed.
关于python - R和Python的cov和cor的区别,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/53116829/