machine-learning - 为什么weka在WEKA上计算stringToWordVector的数字函数错误？

标签 machine-learning classification weka feature-extraction text-classification

我想在 WEKA 应用程序上计算数据集的 stringToWordVector。我将wordsToKeep的参数更新为50。但它计算出78个单词。我想要 50 个字，但它计算出 78 个字。如何修正计算结果？

我的数据集:http://www.dt.fee.unicamp.br/~tiago/smsspamcollection - 链接1

最佳答案

-W 选项限制每个类保留的单词数，因此对于 2 个类，设置 -W 50 的限制为 100

来源:

public String wordsToKeepTipText() {
    return "The number of words (per class if there is a class attribute "+
    "assigned) to attempt to keep.";
  }

此外，基于source ，这不是一个严格的约束，它只影响在哪里修剪排序的事件列表，这可以改变

// sort the array
sortArray(array);
if (array.length < m_WordsToKeep) {
// if there aren't enough words, set the threshold to
// minFreq
prune[z] = m_minTermFreq;
  } else {
// otherwise set it to be at least minFreq
prune[z] = Math.max(m_minTermFreq, 
    array[array.length - m_WordsToKeep]);
  }

关于machine-learning - 为什么weka在WEKA上计算stringToWordVector的数字函数错误？，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/34462687/

上一篇：c# - 不直接随输出缩放的变量的线性回归

下一篇：machine-learning - 如何使用神经网络进行人脸检测？

相关文章：

python - 稀疏矩阵可以与 MultinomialNB 一起使用吗？

python - Keras:输入层并正确传递输入数据

machine-learning - 我如何在此示例中使用基于规则的学习算法

java - 从大型 java 项目中提取模块

statistics - 如何进行分类

从头开始实现的 C++ 神经网络在 MNIST 上无法达到 50% 以上

python - 使用 sklearn 和 Spark 时的轮廓分数不同

machine-learning - NLP 模型在训练时的准确率停留在 0.5098

php - 使用朴素贝叶斯分类器对推文进行分类 : some problems

matlab - 选择在 PCA 中显示最大方差的组件