python - Sci-Kit 机器学习程序的结果代表什么?

标签 python machine-learning scikit-learn

我正在观看 Google 的机器学习视频,并完成了一个利用数据库存储有关花卉信息的程序。程序成功运行,但我很难理解结果:

from scipy.spatial import distance
def euc(a,b):
    return distance.euclidean(a, b)

class ScrappyKNN():

    def fit(self, x_train, y_train):

        self.x_train = x_train

        self.y_train = y_train

   def predict(self, x_test):

        predictions = []

        for row in x_test:

            label = self.closest(row)

            predictions.append(label)

        return predictions

   def closest(self, row):

        best_dist = euc(row, self.x_train[0])

        best_index = 0

        for i in range(1, len(self.x_train)):

            dist = euc(row, self.x_train[i])

            if dist < best_dist:

                best_dist = dist

                best_index = i

        return self.y_train[best_index]

from sklearn import datasets

iris = datasets.load_iris()

x = iris.data

y = iris.target

from sklearn.model_selection import train_test_split

x_train, x_test, y_train, y_test = train_test_split(x,y, test_size =.5)

print(x_train.shape, x_test.shape)

my_classifier = ScrappyKNN()

my_classifier .fit(x_train, y_train)

prediction = my_classifier.predict(x_test)



from sklearn.metrics import accuracy_score

print(accuracy_score(y_test, prediction))

结果如下: (75, 4) (75, 4) 0.96

96%是准确率,但是75和4到底代表什么?

最佳答案

您正在这一行打印数据集的形状:

print(x_train.shape, x_test.shape) 

x_trainx_test 似乎各有 75 行(即数据点)和 4 列(即特征)。除非您有奇数个数据点,否则这些维度应该相同,因为您在此行上执行 50/50 训练/测试数据分割:

x_train, x_test, y_train, y_test = train_test_split(x,y, test_size =.5)

关于python - Sci-Kit 机器学习程序的结果代表什么?,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/57121135/

相关文章:

python - 提高单行 Pandas read_csv 的速度

tensorflow - 将预训练模型生成的预测输出解码为人类可读的标签

machine-learning - LightGBM:Sklearn 和 Native API 等效项

python - Sklearn MLP 分类器超参数优化 (RandomizedSearchCV)

python - 关闭 tkinter GUI 而不终止应用程序

python - 在 Python 中比较纬度/经度并显示最近到最远

python - Python如何连续填充子进程的多个线程?

python - Keras 二维输入到二维输出

matlab - 如何在神经网络Matlab中将训练数据制作为4D数组 - 输入数据的正确方法

python - tensorflow 导入导致numpy计算错误