我正在观看 Google 的机器学习视频,并完成了一个利用数据库存储有关花卉信息的程序。程序成功运行,但我很难理解结果:
from scipy.spatial import distance
def euc(a,b):
return distance.euclidean(a, b)
class ScrappyKNN():
def fit(self, x_train, y_train):
self.x_train = x_train
self.y_train = y_train
def predict(self, x_test):
predictions = []
for row in x_test:
label = self.closest(row)
predictions.append(label)
return predictions
def closest(self, row):
best_dist = euc(row, self.x_train[0])
best_index = 0
for i in range(1, len(self.x_train)):
dist = euc(row, self.x_train[i])
if dist < best_dist:
best_dist = dist
best_index = i
return self.y_train[best_index]
from sklearn import datasets
iris = datasets.load_iris()
x = iris.data
y = iris.target
from sklearn.model_selection import train_test_split
x_train, x_test, y_train, y_test = train_test_split(x,y, test_size =.5)
print(x_train.shape, x_test.shape)
my_classifier = ScrappyKNN()
my_classifier .fit(x_train, y_train)
prediction = my_classifier.predict(x_test)
from sklearn.metrics import accuracy_score
print(accuracy_score(y_test, prediction))
结果如下: (75, 4) (75, 4) 0.96
96%是准确率,但是75和4到底代表什么?
最佳答案
您正在这一行打印数据集的形状:
print(x_train.shape, x_test.shape)
x_train
和 x_test
似乎各有 75 行(即数据点)和 4 列(即特征)。除非您有奇数个数据点,否则这些维度应该相同,因为您在此行上执行 50/50 训练/测试数据分割:
x_train, x_test, y_train, y_test = train_test_split(x,y, test_size =.5)
关于python - Sci-Kit 机器学习程序的结果代表什么?,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/57121135/