SVM的缺点they say
If the number of features is much greater than the number of samples, the method is likely to give poor performances.
在这种情况下有什么好的选择?
最佳答案
您可以关注libsvm guide for beginners ,在 C.1 节中,它为您提供了答案以及您所问问题的示例:
C.1 Number of instances << number of features Many microarray data in
bioinformatics are of this type. We consider the Leukemia data from the LIBSVM data sets (http://www.csie.ntu.edu.tw/~cjlin/libsvmtools/ datasets). The training and testing sets have 38 and 34 instances, respectively. The number of features is 7,129, much larger than the number of instances. We merge the two les and compare the cross validation accuracy of using the RBF and the linear kernels:
RBF kernel with parameter selection
$ cat leu leu.t > leu.combined
$ python grid.py leu.combined
...
8.0 3.0517578125e-05 97.2222
(Best C=8.0, = 0:000030518 with ve-fold cross-validation rate=97.2222%)
Linear kernel with parameter selection
$ python grid.py -log2c -1,2,1 -log2g 1,1,1 -t 0 leu.combined
...
0.5 2.0 98.6111
(Best C=0.5 with ve-fold cross-validation rate=98.61111%)
Though grid.py was designed for the RBF kernel, the above way checks various C using the linear kernel (-log2g 1,1,1 sets a dummy ).
The cross-validation accuracy of using the linear kernel is comparable to that of using the RBF kernel. Apparently, when the number of features is very large, one may not need to map the data.
In addition to LIBSVM, the LIBLINEAR software mentioned below is also effective for data in this case.
如您所见,您可以使用带有线性核的 SVM 并获得良好的结果。
关于machine-learning - SVM 的替代方案,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/16783659/