machine-learning - SVC 分类器花费太多时间进行训练

我正在使用带有线性内核的 SVC 分类器来训练我的模型。列车数据:42000条记录

    model = SVC(probability=True)
    model.fit(self.features_train, self.labels_train)
    y_pred = model.predict(self.features_test)
    train_accuracy = model.score(self.features_train,self.labels_train)
    test_accuracy = model.score(self.features_test, self.labels_test)

训练我的模型需要两个多小时。难道我做错了什么？另外，可以采取哪些措施来缩短时间

提前致谢

最佳答案

有多种方法可以加快 SVM 训练速度。令n 为记录数，d 为嵌入维数。我假设您使用 scikit-learn。

减少训练集大小。引用 docs :

The fit time complexity is more than quadratic with the number of samples which makes it hard to scale to dataset with more than a couple of 10000 samples.

O(n^2) 复杂性很可能会主导其他因素。因此，减少训练记录采样将对时间产生最大的影响。除了随机抽样之外，您还可以尝试instance selection方法。例如，principal sample analysis最近已被提议。
降维。正如其他人在评论中暗示的那样，嵌入维度也会影响运行时间。计算线性核的内积的时间复杂度为O(d)。 Dimensionality reduction因此，也可以减少运行时间。在 another question ，潜在语义索引是专门针对 TF-IDF 表示而提出的。
参数。除非您需要概率，否则请使用 SVC(probability=False)，因为它们“会减慢该方法的速度。”(来自文档)。
实现。据我所知，scikit-learn 仅包含 LIBSVM 和 LIBLINEAR。我在这里进行推测，但您也许可以通过使用高效的 BLAS 库(例如 Intel 的 MKL 中的库)来加快速度。
不同的分类器。您可以尝试sklearn.svm.LinearSVC，这是...

[s]imilar to SVC with parameter kernel=’linear’, but implemented in terms of liblinear rather than libsvm, so it has more flexibility in the choice of penalties and loss functions and should scale better to large numbers of samples.

此外，一位 scikit-learn 开发者建议使用 kernel_approximation模块 similar question .

关于machine-learning - SVC 分类器花费太多时间进行训练，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/53940258/

machine-learning - SVC 分类器花费太多时间进行训练

上一篇：python - Keras错误: TypeError: 'int' object is not iterable

下一篇：python - TimeDistributed 一次多个层