python - 目标变量的字符串和数字的混合

标签 python python-3.x machine-learning scikit-learn

我正在测试下面的代码,但在最后一行出现错误。

dataset = df[['Rate', 'Weights', 'Change', 'Price', 'CategoryOne']].copy() # 
dataset.shape


X = dataset.iloc[:, :-1].values
y = dataset.iloc[:, 4].values


from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.20)



from sklearn.preprocessing import StandardScaler
scaler = StandardScaler()
scaler.fit(X_train)

X_train = scaler.transform(X_train)
X_test = scaler.transform(X_test)


#Import knearest neighbors Classifier model
from sklearn.neighbors import KNeighborsClassifier

#Create KNN Classifier
knn = KNeighborsClassifier(n_neighbors=5)

#Train the model using the training sets
knn.fit(X_train, y_train)

#Predict the response for test dataset
y_pred = knn.predict(X_test)


#Import scikit-learn metrics module for accuracy calculation
from sklearn import metrics
# Model Accuracy, how often is the classifier correct?
print("Accuracy:",metrics.accuracy_score(y_test, y_pred))



#Import knearest neighbors Classifier model
from sklearn.neighbors import KNeighborsClassifier

#Create KNN Classifier
knn = KNeighborsClassifier(n_neighbors=7)

#Train the model using the training sets
knn.fit(X_train, y_train)

在最后一行,当我尝试拟合 X_train 和 y_train 时,出现以下错误:

TypeError: '<' not supported between instances of 'int' and 'str'

CategoryOne 字段中的数据如下所示:'2a', '1', '2a'。这可能是问题所在吗?我知道目标变量不一定是数字。我只想查看自变量和因变量 (CategoryOne) 之间的关系。

这是堆栈跟踪:

Traceback (most recent call last):

  File "<ipython-input-108-36266936f0ca>", line 29, in <module>
    knn.fit(X_train, y_train)

  File "C:\Users\rs\AppData\Local\Continuum\anaconda3\lib\site-packages\sklearn\neighbors\base.py", line 906, in fit
    check_classification_targets(y)

  File "C:\Users\rs\AppData\Local\Continuum\anaconda3\lib\site-packages\sklearn\utils\multiclass.py", line 166, in check_classification_targets
    y_type = type_of_target(y)

  File "C:\Users\rs\AppData\Local\Continuum\anaconda3\lib\site-packages\sklearn\utils\multiclass.py", line 287, in type_of_target
    if (len(np.unique(y)) > 2) or (y.ndim >= 2 and len(y[0]) > 1):

  File "C:\Users\rs\AppData\Local\Continuum\anaconda3\lib\site-packages\numpy\lib\arraysetops.py", line 264, in unique
    ret = _unique1d(ar, return_index, return_inverse, return_counts)

  File "C:\Users\rs\AppData\Local\Continuum\anaconda3\lib\site-packages\numpy\lib\arraysetops.py", line 312, in _unique1d
    ar.sort()

TypeError: '<' not supported between instances of 'int' and 'str'

最佳答案

您可以尝试通过修改 dataset 的构造,将 CategoryOne 列显式附加为字符串数据,如下所示:

dataset = df[['Rate', 'Weights', 'Change', 'Price', 'CategoryOne']].copy()
dataset['CategoryOne'] = dataset['CategoryOne'].map(lambda x : str(x))

关于python - 目标变量的字符串和数字的混合,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/59673487/

相关文章:

python - 如何将 html `abbr` 标签文本转换为 Python 中括号中的文本?

python - 用于一维输入的 LSTM - TensorFlow 异常

python - 让 Sprite 互相弹开

python - 排序字典并创建一个字符串

python - 更改 keras 层的激活函数而不替换整个层

python - 如何将参数传递给 Tkinter 按钮的回调命令?

python - 如何在 python 中将 1970 年之前的日期转换为纪元?

python - 在 python3 'TypeError: descriptor ' __subclasses_ _' of ' 类型中导入 Pandas 时出错 object needs an argument'

python - 多类 CNN 的宏观指标(召回/F1 ...)

python - 如何将机器学习与 Firebase 数据结构结合使用?