python - 使用 sklearn imputer 类时数组索引过多

标签 python pandas machine-learning scikit-learn imputation

我正在练习机器学习中的数据集，在获取缺失值时，我使用了 imputer 类，但它给了我一个错误:数组索引太多。对于这个错误，我只是查看了所有 numpy 模块，但我没有任何想法来解决它。

import numpy as np
import matplotlib.pyplot as mlp
import pandas as pd

#import datasets
i_export = pd.read_csv("2018-2010_export.csv")
x=i_export.iloc[:, [0,1,3,4]].values
y=i_export.iloc[:,2].values

#splitting training test set
from sklearn.model_selection import train_test_split
x_train,x_test,y_train,y_test=train_test_split(x,y,test_size=0.2,random_state=0)

#calculating missing data
from sklearn.impute import SimpleImputer
impute=SimpleImputer(missing_values=np.nan,strategy='mean')
impute=SimpleImputer.fit(y_test[:,0])
y_test[:,0]=SimpleImputer.fit_transform(y_test[:,0])

最佳答案

我认为 y_test 是一个一维数组。当您尝试为 y_test[:,0] 建立索引时，您正在尝试为两个维度建立索引。

您可以使用此代码段将 y_test 数组转换为具有一列和“n”行的二维数组 y_test = y_test.reshape(-1,1)

这是我对您的代码的更改。我对您使用简单输入器的方式做了一些更改。

from sklearn.impute import SimpleImputer
imputer=SimpleImputer(missing_values=np.nan,strategy='mean')
y_test=imputer.fit_transform(y_test.reshape(-1, 1))

关于python - 使用 sklearn imputer 类时数组索引过多，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/58068428/

上一篇：python-3.x - 与逻辑回归、SVM 和带有交叉验证网格搜索的朴素贝叶斯相比，随机森林分类器的性能是否明显不佳？

下一篇：python - 为什么 'kd_tree' 比 'brute' 花费更多时间？

相关文章：

python 列表到带有列标题的数据框并删除数据类型

python - 使用 MNIST 训练模型的 Tensorflow 总是打印错误的数字

python - 感知器学习算法需要大量迭代才能收敛？

machine-learning - 我可以先将 "classification"应用到同一数据集，然后再应用 "regression"吗？

python - 如何在 `return FileResponse(file_path)`之后删除文件

python - Hackerrank频率查询

python - c 的 SWIG python 绑定(bind)找不到标准 header 的 _EXFUN

python - 将 pickle 文件写入 AWS 中的 s3 存储桶

python - 如何计算pandas中每个p1 id下的所有 child 的数量？ id 和父 id 已给出

python - Pandas Left Merge 与 xlsx 和 CSV 在输出中生成空值列