matlab - 使用 PCA 对手写数字进行分类

Classify handwritten digits using PCA. Use 200 digits for the train phase and 20 for the test.

我不知道 PCA 作为一种分类方法是如何工作的。我学会了使用它作为降维方法，我们从原始数据的平均值中减去原始数据，然后计算协方差矩阵、特征值和特征向量。从那里，我们可以选择主要成分并忽略其余部分。我应该如何对一堆手写数字进行分类？如何区分不同类的数据？

最佳答案

如果您绘制从 PCA 获得的分数，您将看到某些类别将产生一个集群。

简单的 R 脚本:

data <- readMat(file.path("testzip.mat"))$testzip
pca <- princomp(t(data))
plot(pca$scores)

会产生这样的情节:

R plot of the digits

我无法给它着色，因为 mat 文件不包含矢量到数字类的结果。然而，您至少会看到一个集群，可以帮助您将该单个类与其他类进行分类(其他东西看起来像噪音？)。

Olivier Grisel(scikit-learn 的贡献者)也回答了您关于 metaoptimize 的问题:

How to use PCA for classification?

他说这实际上是一种无监督的降维方法，但是可以使用一些奇特的方法进行分类:

Actually I have found another way to do "classification with PCA" in this talk by Stéphane Mallat: each class is approximated by a affine manifold with the first component as direction and the centroid as offset and new samples are classified by measuring distance to the nearest manifold with an orthogonal projection.

Talk: https://www.youtube.com/watch?v=lFJ7KdSdy0k (very interesting for CV people)

Related papers: http://www.cmap.polytechnique.fr/scattering/

但我认为这对你来说有点过分了。如果您有类标签，则可以使用任何分类器在 PCA 输出上拟合此问题。如果没有，请选择基于密度的聚类(例如 DBSCAN)，看看它是否找到您在那里看到的聚类，并使用它对新图像进行分类(例如，根据与聚类平均值的距离)。

关于matlab - 使用 PCA 对手写数字进行分类，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/14610377/

matlab - 使用 PCA 对手写数字进行分类

上一篇：r - 如何在 R 中模拟词袋模型以适应 SVM

下一篇：machine-learning - 具有季节性的时间序列分析。有这样的统计/机器学习java库可用吗？