machine-learning - 机器学习的测试数据需要有列名吗？

标签 machine-learning scikit-learn training-data test-data

假设我有如下训练数据:

Age:12   Height:150   Weight:100     Gender:M
Age:15   Height:145   Weight:80      Gender:F
Age:17   Height:147   Weight:110     Gender:F
Age:11   Height:144   Weight:130     Gender:M

在我训练数据并获得模型后，如果我需要通过一个测试观察进行预测，我是否需要发送具有如下列名称的数据？

Age: 13   Height:142  Weight :90

在某些情况下，我看到人们在没有列名的数组中发送测试数据。我不确定算法是如何工作的。

注意:我使用的是 python scikit-learn，我的训练数据是一个 dataFrame。所以我不确定我的测试数据是否也应该是dataFrame格式

最佳答案

你在预测性别吗？

如果是这样，那么是的。您的输入是包含以下列的记录:年龄、高度 和体重。

否则，您将在缺少 Gender 值的记录上进行预测。如果您的模型不允许缺少字段/列，您可能会收到 KeyError。

I am not sure whether my test data should also be in dataFrame format

简而言之:是的。

通常你这样做:

# X is your input data, the format depends on how your model (pre)process the data.
# It could be a numeric matrix, a list of dict's, a list of strings, etc.
X_train, X_test, y_train, y_test = train_test_split(X, y)
# Fit and validate.
clf.fit(X_train, y_train)
y_pred = clf.predict(X_test)

因此，您的训练和测试数据采用相同的格式，或者至少采用兼容的格式(即:pandas 数据帧与 dict 的 list 兼容)。

关于machine-learning - 机器学习的测试数据需要有列名吗？，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/34823046/

上一篇：python - 线性回归 Lasagne/Theano

下一篇：machine-learning - Torch7 替代 MultiLabelMarginCriterion

相关文章：

.net - 数千个类的机器学习多类分类

ocr - Tesseract 培训 - 只有数字的新字体

tensorflow - 如何在 Tensorflow Estimator 的 input_fn 中执行数据增强

python - 如何使用字典执行分类，其中每个键都是标签，每个值都是数据帧？

python - 使用 python 和 numpy 进行梯度下降

machine-learning - Keras 中每个时期的验证准确性是如何确定的？

python - 分层KFold : shuffle and random_state

python - 从数据集中删除异常值

python - sklearn 中的多类概率标签数据和同类型预测使用什么？

machine-learning - 学习矢量量化 (LVQ) 不平衡输入大小