python-3.x - 如何在 scikit-learn 中编写自定义转换器,以有条件地在不同类之间切换

标签 python-3.x oop machine-learning scikit-learn

我正在编写一个可以在不同缩放器之间切换的类。以下“有效”(但不在缩放器之间切换):

from sklearn.preprocessing import StandardScaler, MinMaxScaler

class CustomTransformer(StandardScaler, MinMaxScaler):
    def __init__(self, which,with_std=True,with_mean=True, feature_range=(0,1)):
        self.which = which
        self.with_mean = with_mean
        self.with_std = with_std
        self.feature_range = feature_range
        if which=="standard":
            self = StandardScaler.__init__(self)
        else:
            self = MinMaxScaler.__init__(self)

X = [[1,2,3],[3,4,5],[6,7,8]]


ct = CustomTransformer(which="standard")    
ct.fit_transform(X)
array([[-1.13554995, -1.13554995, -1.13554995],
       [-0.16222142, -0.16222142, -0.16222142],
       [ 1.29777137,  1.29777137,  1.29777137]])

ct = CustomTransformer(which="")
ct.fit_transform(X)
array([[-1.13554995, -1.13554995, -1.13554995],
       [-0.16222142, -0.16222142, -0.16222142],
       [ 1.29777137,  1.29777137,  1.29777137]])

所以我的问题更像是一个理论问题:

What is a correct way of a conditional multiple class inheritance in scikit-learn with switching scalers?

最佳答案

这个“只是”有效:

from sklearn.base import TransformerMixin
from sklearn.preprocessing import StandardScaler, MinMaxScaler

X = [[1,2,3],[3,4,5],[6,7,8]]

class CustomTransformer(TransformerMixin):
    def __init__(self, condition,with_mean=True, with_std=True, feature_range=(0,1), **kwargs):
        self.condition = condition
        if condition:
            self.scaler = StandardScaler(with_mean=with_mean, with_std=with_std, **kwargs)
        else:
            self.scaler = MinMaxScaler(feature_range=feature_range, **kwargs)
    def fit(self, X):
        return self.scaler.fit(X)
    def transform(self, X):
        return self.scaler.transform(X)
    def get_params(self):
        d = self.scaler.get_params()
        d['condition'] = self.condition
        return d
ct = CustomTransformer(False, feature_range=(0,.1))
ct.fit_transform(X)
array([[0.  , 0.  , 0.  ],
       [0.04, 0.04, 0.04],
       [0.1 , 0.1 , 0.1 ]])
ct = CustomTransformer(True, feature_range=(0,.1))
ct.fit_transform(X)
array([[-1.13554995, -1.13554995, -1.13554995],
       [-0.16222142, -0.16222142, -0.16222142],
       [ 1.29777137,  1.29777137,  1.29777137]])

现在这个 CustomTransformer 可以通过 .get_params() 访问 GridSearchCV:

from sklearn.model_selection import GridSearchCV
gs = GridSearchCV(ct, param_grid={})
gs.get_params()
{'cv': None,
 'error_score': nan,
 'estimator__copy': True,
 'estimator__with_mean': True,
 'estimator__with_std': True,
 'estimator__condition': True,
 'estimator': <__main__.CustomTransformer at 0x7fbd8d3aa9d0>,
 'iid': 'deprecated',
 'n_jobs': None,
 'param_grid': {},
 'pre_dispatch': '2*n_jobs',
 'refit': True,
 'return_train_score': False,
 'scoring': None,
 'verbose': 0}

关于python-3.x - 如何在 scikit-learn 中编写自定义转换器,以有条件地在不同类之间切换,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/62212734/

相关文章:

python - 当测试的函数是 Click 命令时 pytest 失败

mysql - 在 Jupyter 笔记本、Conda 环境、Python 3.6 上找不到 MySQL

c++ - 从另一个类更改一个类的数据成员的值

c# - 扩展无构造函数类

database - 如何将数据分成训练集和测试集?

python - 获取调用方法的类名

python-3.x - Pytest 创建临时 CSV 文件以供读取

c++ - 在类中组织函数

java - 使用来自网络的无监督爬行文本来训练 word2vec 是个好主意吗?

python - 支持向量回归 (SVR) 在 Ubuntu 18.04 LTS 中未绘制任何图形