analytics - Javascript 中的停用词删除

标签 analytics data-mining javascript stemming

<分区>

您好,我正在寻找一个可以从 Javascript 文本中删除停用词的库,我的最终目标是计算 tf-idf,然后将给定文档转换为向量空间,以及所有这是 Javascript。 任何人都可以指出一个可以帮助我做到这一点的图书馆。只是一个删除停用词的图书馆也很棒。

最佳答案

使用 NLTK library 提供的停用词:

stopwords = ['i','me','my','myself','we','our','ours','ourselves','you','your','yours','yourself','yourselves','he','him','his','himself','she','her','hers','herself','it','its','itself','they','them','their','theirs','themselves','what','which','who','whom','this','that','these','those','am','is','are','was','were','be','been','being','have','has','had','having','do','does','did','doing','a','an','the','and','but','if','or','because','as','until','while','of','at','by','for','with','about','against','between','into','through','during','before','after','above','below','to','from','up','down','in','out','on','off','over','under','again','further','then','once','here','there','when','where','why','how','all','any','both','each','few','more','most','other','some','such','no','nor','not','only','own','same','so','than','too','very','s','t','can','will','just','don','should','now']

然后只需将您的字符串传递给以下函数:

function remove_stopwords(str) {
    res = []
    words = str.split(' ')
    for(i=0;i<words.length;i++) {
       word_clean = words[i].split(".").join("")
       if(!stopwords.includes(word_clean)) {
           res.push(word_clean)
       }
    }
    return(res.join(' '))
}  

示例:

remove_stopwords("I will go to the place where there are things for me.")

结果:

I go place things

只需将任何尚未涵盖的单词添加到您的 NLTK 数组即可。

关于analytics - Javascript 中的停用词删除,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/5631422/

相关文章:

download - 跟踪App Store下载源

api - 是否可以将 Google Analytics 添加到 doxygen 生成的文件中?

process - 准备用于流程挖掘的 csv 文件

javascript - 在本地运行的 Meteor.js 应用抛出 TypeError : Cannot read property 'appId' of undefined

javascript - “重力”在 Javascript/HTML5 游戏中无法正常工作

javascript - 来自 Cordova 相机的 Javascript 文件对象的 CreateObjectUrl

analytics - Bluemix 监控和分析 : Resource Monitoring - JsonSender request error

php - 页面 View 访问路径报告在 COBUB RAZOR 中始终显示 "no data"

algorithm - 从封闭频繁项集生成计数

string - 聚类(尤其是字符串聚类)如何工作?