我已经构建了一个内容聚合器,并想添加一个代表当前趋势的标签云。
不幸的是,这非常复杂,因为我必须寻找代表每篇文章上下文的关键字。
例如I、was、the、amazing、nice等词strong> 与上下文无关。
帮助将不胜感激! :)
最佳答案
使用NLTK ,特别是它的停用词语料库:
Besides regular content words, there is another class of words called stop words that perform important grammatical functions, but are unlikely to be interesting by themselves. These include prepositions, complementizers, and determiners. NLTK comes bundled with the Stopwords corpus, a list of 2400 stop words across 11 different languages (including English).
关于python - 构建标签云的巧妙方法? - Python,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/2485800/