machine-learning - SVM 的替代方案

标签 machine-learning

SVM的缺点they say

If the number of features is much greater than the number of samples, the method is likely to give poor performances.

在这种情况下有什么好的选择?

最佳答案

您可以关注libsvm guide for beginners ,在 C.1 节中,它为您提供了答案以及您所问问题的示例:

C.1 Number of instances << number of features Many microarray data in

bioinformatics are of this type. We consider the Leukemia data from the LIBSVM data sets (http://www.csie.ntu.edu.tw/~cjlin/libsvmtools/ datasets). The training and testing sets have 38 and 34 instances, respectively. The number of features is 7,129, much larger than the number of instances. We merge the two les and compare the cross validation accuracy of using the RBF and the linear kernels:

  • RBF kernel with parameter selection

    $ cat leu leu.t > leu.combined

    $ python grid.py leu.combined

    ...

    8.0 3.0517578125e-05 97.2222

    (Best C=8.0, = 0:000030518 with ve-fold cross-validation rate=97.2222%)

  • Linear kernel with parameter selection

    $ python grid.py -log2c -1,2,1 -log2g 1,1,1 -t 0 leu.combined

    ...

    0.5 2.0 98.6111

    (Best C=0.5 with ve-fold cross-validation rate=98.61111%)

    Though grid.py was designed for the RBF kernel, the above way checks various C using the linear kernel (-log2g 1,1,1 sets a dummy ).

The cross-validation accuracy of using the linear kernel is comparable to that of using the RBF kernel. Apparently, when the number of features is very large, one may not need to map the data.

In addition to LIBSVM, the LIBLINEAR software mentioned below is also effective for data in this case.

如您所见,您可以使用带有线性核的 SVM 并获得良好的结果。

关于machine-learning - SVM 的替代方案,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/16783659/

相关文章:

python - 简单的 Python 模糊卷积核函数生成奇怪的图像

machine-learning - 减少数据集以获得更好的 PCA 分解是一个好习惯吗

machine-learning - Genisim doc2vec : how is short doc processed?

java - JAHMM 包 Maven

machine-learning - 许多机器学习模型 - 保存和加载

尽管输入不同,PHP Fann 仍给出相同的结果

r - csv 文件中字符串的 tf-idf

machine-learning - keras的输入层可以接受自定义输入吗?

python - python中的词聚类

machine-learning - 验证准确性并没有提高训练 ResNet50