python-3.x - 我们可以对数据集中的 "independent variable"应用特征缩放吗？

标签 python-3.x machine-learning data-science

我有一个包含 8 个因变量(2 个分类数据)的数据集。我已经应用了 ExtraTreeClassifier() 来消除一些因变量。我还对 X,y 进行了特征缩放。

 from sklearn.preprocessing import StandardScaler
 sc = StandardScaler()
 X = sc.fit_transform(X)
 X = sc.transform(X)
 y = sc.fit_transform(y)
 y = sc.transform(y)

在此之后，我将数据集分割为

from sklearn.cross_validation import train_test_split
X_train, X_test, y_train, y_test = train_test_split(X_new, encoded2, 
test_size = 0.25, random_state = 0)

现在我正在应用DecisionTreeRegressor算法进行预测。但我想要实际的预测(现在我正在获得缩放值)。怎么做？还有其他方法可以做到吗？因为我所做的方法是给出 RMSE = 0.02，如果我没有进行特征缩放，则因变量 RMSE = 18.4。请建议如何解决此类问题。

最佳答案

首先，不需要缩放目标变量 (y)，但如果您确实缩放它，StandardScaler 和各种其他此类预处理技术可以inverse_transform 函数，您可以通过它获取原始值。

来自 StandardScaler 的文档:

inverse_transform(X[, copy]) Scale back the data to the original representation

关于python-3.x - 我们可以对数据集中的 "independent variable"应用特征缩放吗？，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/52447807/

上一篇：python - 重用中间层作为 Keras 中另一个模型的输入

下一篇：swift - 下载自定义 CoreML 模型并加载以供使用 [Swift]

相关文章：

python - 如何在 Python 中读取 PGM P2 图像

python - 从嵌套子列表返回八元组模式

python - bs4 `next_sibling` VS `find_next_sibling`

machine-learning - 如何获得随机森林算法对自变量使用的最终方程来预测因变量？

python - 在 python 中应用 PMML 预测器模型

python - 机器学习: How to regularize output and force them to be away from 0?

python - 如何有效计算 pandas 中所有后续行的平均值？

python - 在张量分解后重新组合张量

Python lxml无法获取所有文本

python - 如何在 pandas Dataframe 中找到 5 个连续的行，其中某一列的值至少为 0.5