python - 如何运行循环以在Python for循环中使用非缩放和缩放数据

标签 python pandas scikit-learn

我运行以下代码,并使用不同的建模技术在虹膜数据上拟合模型。如何在此过程中添加第二个步骤,以便展示使用缩放数据和非缩放数据之间的改进?

我不需要在循环之外运行缩放转换,我只是在将数据类型从 pandas 数据帧转换为 np 数组并再次转换回来时遇到了很多问题。

from sklearn import datasets
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
from sklearn.cross_validation import KFold
from sklearn.linear_model import LogisticRegression
from sklearn import svm
from sklearn.metrics import accuracy_score

iris = datasets.load_iris()
X = iris.data[:, :2]  # we only take the first two features.
y = iris.target


X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=.2)


sc = StandardScaler()
X_train_scale = sc.fit_transform(X_train)
X_test_scale = sc.transform(X_test)

numFolds = 10
kf = KFold(len(y_train), numFolds, shuffle=True)

# These are "Class objects". For each Class, find the AUC through
# 10 fold cross validation.
Models = [LogisticRegression, svm.SVC]
params = [{},{}]

for param, Model in zip(params, Models):
    total = 0
    for train_indices, test_indices in kf:

        train_X = X_train[train_indices]; train_Y = y_train[train_indices]
        test_X = X_train[test_indices]; test_Y = y_train[test_indices]

        reg = Model(**param)
        reg.fit(train_X, train_Y)
        predictions = reg.predict(test_X)
        total += accuracy_score(test_Y, predictions)
    accuracy = total / numFolds

    print ("CV accuracy score of {0}: {1}".format(Model.__name__, round(accuracy, 6)))

所以理想情况下我的输出是:

CV standard accuracy score of LogisticRegression: 0.683333
CV scaled accuracy score of LogisticRegression: 0.766667
CV standard accuracy score of SVC: 0.766667
CV scaled accuracy score of SVC: 0.783333

这似乎还不清楚,我正在尝试循环遍历缩放和未缩放的数据,类似于我循环遍历不同的机器学习算法的方式。

最佳答案

我想跟进此事。我能够通过创建管道并使用 gridsearchCV 来做到这一点

pipe = Pipeline([('scale', StandardScaler()), 
                 ('clf', LogisticRegression())])
param_grid = [{
        'scale':[None,StandardScaler()],
        'clf':[SVC(),LogisticRegression()]}]
grid_search = GridSearchCV(pipe, param_grid=param_grid,n_jobs=-1, verbose=1 )

最终,这得到了我想要的结果,并且能够轻松测试如何在缩放和不缩放之间工作。

关于python - 如何运行循环以在Python for循环中使用非缩放和缩放数据,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/50712584/

相关文章:

python - 在 scikit-learn 管道中使用 gensim word2vec

python - 回归分类报告(sklearn)

python - 我无法在 Django 模板上显示图像

python - 数据库列不存在

javascript - BokehJS 标题文本对齐

python - 有什么理由在 PyMem_Malloc 上使用 malloc 吗?

python - 如何按最大日期时间过滤数据框中的行?

python - 即使系列中没有缺失值,Pandas cumsum 也会导致 NaN

python - 替换列表内的值

python - 如何将正则化参数传递给模型选择(sklearn)?