我对 sklearn.linear_model 中 RidgeCV 中 normalized= 的具体作用感到困惑。

文档说:

normalize : bool, default=False This parameter is ignored when fit_intercept is set to False. If True, the regressors X will be normalized before regression by subtracting the mean and dividing by the l2-norm. If you wish to standardize, please use :class:sklearn.preprocessing.StandardScaler before calling fit on an estimator with normalize=False.

我们通常将标准化称为减去均值并除以 l2-范数。但文档将其称为“标准化”。
如果我正确理解文档，我应该按照以下方式使用第三个代码块(最后一个 block )

If you wish to standardize, please use
:class:`sklearn.preprocessing.StandardScaler` before calling ``fit``
on an estimator with ``normalize=False``.

那么，我该如何解释这些系数呢？这些是标准化系数吗？但从它们的大小来看，我怀疑它们是否是标准化系数。

总的来说，我不确定我是否遵循了有关此 normalize 参数的文档。

我将用其他语言测试类似的代码，看看会得到什么。

from sklearn.datasets import load_diabetes
from sklearn.linear_model import RidgeCV
X, y = load_diabetes(return_X_y=True)

没有标准化

clf = RidgeCV(normalize=False,alphas=[1e-3, 1e-2, 1e-1, 1]).fit(X, y)
clf.coef_
print(clf.alpha_)
print(clf.score(X,y))
print(clf.coef_)
0.01 
0.5166287840315846 
[ -7.19945679 -234.55293001 520.58313622 320.52335582 -380.60706569 150.48375154 -78.59123221 130.31305868 592.34958662 71.1337681 ]

标准化和规范化=True

from sklearn.preprocessing import StandardScaler
scaler = StandardScaler()
scaler.fit(X)
X_std = scaler.transform(X)
clf = RidgeCV(normalize=True,alphas=[1e-3, 1e-2, 1e-1, 1]).fit(X_std, y)
print("standardize and normalize=True")
print(clf.alpha_)
print(clf.score(X_std,y))
print(clf.coef_)

standardize and normalize=True
0.01
0.5166287840315843
[ -0.34244324 -11.15654516  24.76161466  15.24574131 -18.10363195
   7.15778213  -3.7382037    6.19836011  28.17519659   3.38348831]

标准化和标准化=False

clf = RidgeCV(normalize=False,alphas=[1e-3, 1e-2, 1e-1, 1]).fit(X_std, y)
print("standardize and normalize=False")
print(clf.alpha_)
print(clf.score(X_std,y))
print(clf.coef_)

standardize and normalize=False
1.0
0.5175831607267165
[ -0.43127609 -11.33381407  24.77096198  15.37375716 -30.08858903
  16.65328714   1.46208255   7.5211415   32.84392268   3.26632702]

最佳答案

编辑:

关于本示例中使用的糖尿病 dataset，还有一件事需要注意。

数据已经标准化，因此单独对其运行标准化可能无法获得您想要的确切效果。

最好使用不同的数据集进行测试。

归一化参数的工作方式与 sklearn.preprocessing.normalizer 相同，但与标准缩放器不同。

主要区别是标准化器将作用于行(观察)，而标准缩放器将作用于列。

这是另一篇与帖子相关的帖子。 Difference between standardscaler and Normalizer in sklearn.preprocessing。

这篇文章还链接了一些您可以探索的其他文章。

编辑:

文档令人困惑，在查看源代码后看起来它实际上可能作用于列而不是行，因为提供了 axis = 0 参数。

我们测试这一点的一种方法是使用 normalize function 并比较它相对于传递参数的性能。

这是进行预处理的代码。 (f_normalize 与链接的函数相同)。

            if normalize:
                X, X_scale = f_normalize(X, axis=0, copy=False,
                                         return_norm=True)

我认为您可以尝试一下，看看是否得到与仅使用标准化参数相同的结果。

from sklearn.preprocessing import normalize

X_std= normalize(X,axis=0,return_norm=False)
clf = RidgeCV(normalize=False,alphas=[1e-3, 1e-2, 1e-1, 1]).fit(X_std, y)
print("standardize and normalize=True")
print(clf.alpha_)
print(clf.score(X_std,y))
print(clf.coef_)

standardize and normalize=True
0.01
0.5166287840315835
[  -7.19945679 -234.55293001  520.58313622  320.52335582 -380.60706569
  150.48375154  -78.59123221  130.31305868  592.34958662   71.1337681 ]

这得到与以下相同的结果:

X, y = load_diabetes(return_X_y=True)

clf = RidgeCV(normalize=True,alphas=[1e-3, 1e-2, 1e-1, 1]).fit(X, y)
print("standardize and normalize=True")
print(clf.alpha_)
print(clf.score(X_std,y))
print(clf.coef_)

standardize and normalize=True
0.01
0.5166287840315835
[  -7.19945679 -234.55293001  520.58313622  320.52335582 -380.60706569
  150.48375154  -78.59123221  130.31305868  592.34958662   71.1337681 ]

关于python - sklearn.linear_model RidgeCV normalize= 参数到底是做什么的，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/60216879/

python - sklearn.linear_model RidgeCV normalize= 参数到底是做什么的

没有标准化

标准化和规范化=True

标准化和标准化=False

上一篇：python - 从 pandas DataFrame 中的另一个数据帧获取列元素值的快速方法

下一篇：python - app.app_context() 的 RuntimeError : Working outside of application context. 无法解决问题