python - 为什么 Python 的 'StandardScaler' 和 Matlab 的 'zscore' 之间的标准化不同？

标签 python matlab machine-learning data-processing

为什么 Python 中的 sklearn.preprocessing.StandardScaler 标准化与 Matlab 中的 zscore 不同？

Python 中的 sklearn.preprocessing 示例:

>>> from sklearn.preprocessing import StandardScaler
>>> data = [[0, 0], [0, 0], [1, 1], [1, 1]]
>>> scaler = StandardScaler()
>>> scaler.fit(data)
>>> print(scaler.mean_)
    [ 0.5  0.5]
>>> print(scaler.var_)
    [0.25 0.25]
>>> print(scaler.transform(data))
[[-1. -1.]
[-1. -1.]
[ 1.  1.]
[ 1.  1.]]

Matlab 中使用 zscore 函数的相同示例:

>> data = [[0, 0]; [0, 0]; [1, 1]; [1, 1]];
>> [Sd_data,mean,stdev] = zscore(data)

    Sd_data =
   -0.8660   -0.8660
   -0.8660   -0.8660
    0.8660    0.8660
    0.8660    0.8660

    mean =
    0.5000    0.5000

    stdev =
    0.5774    0.5774

最佳答案

问题似乎在于自由度(ddof - 与标准差估计相关的校正因子)，StandardScaler 默认情况下该自由度似乎为 0。

作为替代方案，scipy.stats 的 zscore 函数允许您在缩放时控制此参数:

from scipy.stats import zscore

zscore(data, ddof=1)
array([[-0.8660254, -0.8660254],
       [-0.8660254, -0.8660254],
       [ 0.8660254,  0.8660254],
       [ 0.8660254,  0.8660254]])

您最终会得到与 matlab 函数相同的输出。当您使用 ddof=0 调用 zscore 时，您将获得与 StandardScaler 相同的输出。

关于python - 为什么 Python 的 'StandardScaler' 和 Matlab 的 'zscore' 之间的标准化不同？，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/49150309/

上一篇：python - 多参数Tensorflow分类程序

下一篇：python - Tensorflow:损失值与精度不一致

相关文章：

python - 在python中从图像中提取颜色

Python 使用 `time.strftime` 在格式化字符串中显示毫秒

python - 如何断言模拟函数是用生成器调用的？

matlab - 用Matlab的 'line'命令画一个正方形

machine-learning - 多类情况的混淆矩阵，所有评估指标的估计

machine-learning - 每个属性具有大量值的分类数据的关联规则挖掘

python - 代码中的逻辑错误？

用于 Caffe 的 Python 还是 Matlab？

matlab - 在Matlab中＆和&&之间有什么区别？

python - Python 中文本分类的特征选择