我尝试在 Sklearn python 中使用 train_test_split
方法,但训练集和测试集的长度不相等。
当我尝试在 Jupyter 笔记本中运行以下代码段时,它显示
"ValueError: Found input variables with inconsistent numbers of samples: [111, 38] "
import pandas as pd
df=pd.read_csv("iris.data")
df.head(5)
x=df[['5.1','3.5','1.4','0.2']].values
y=df[['Iris-setosa']].values
from sklearn.model_selection import train_test_split
x_train,y_train,x_test,y_test=train_test_split(x,y,random_state=42)
print("train set: ",x_train.shape,y_train.shape)
print("test set: ",x_test.shape,y_test.shape)
from sklearn.neighbors import KNeighborsClassifier
k=4
neigh=KNeighborsClassifier(n_neighbors=k).fit(x_train,y_train)
neigh
最佳答案
更改行:
x_train,y_train,x_test,y_test=train_test_split(x,y,random_state=42)
至:
x_train,x_test,y_train,y_test=train_test_split(x,y,random_state=42)
你很好!
顺便说一句,您的数据框没有标题。因此,(i)要正确加载它并(ii)防止在适合模型时出现警告,您应该将其放在代码的顶部:
df = pd.read_csv("iris.data", header=None)
x=df.iloc[:,[0,1,2,3]].values
y=df.iloc[:,[4]].values.ravel()
关于python - sklearn python中样本数量不一致,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/54559614/