algorithm - 波特词干分析器，步骤 1b

标签 algorithm nlp stemming porter-stemmer

与此类似的问题 [1] porter stemming algorithm implementation question? , 但扩大了。

基本上，step1b 定义为:

Step1b

`(m>0) EED -> EE                    feed      ->  feed
                               agreed    ->  agree
(*v*) ED  ->                       plastered ->  plaster
                               bled      ->  bled
(*v*) ING ->                       motoring  ->  motor
                               sing      ->  sing `

我的问题是为什么 feed 是 feed 而不是 fe？我试过的所有在线 Porter Stemmer 的词干都是 feed，但据我所知，它应该是 fe。

我的思路是:

`feed` does not pass through     `(m>0) EED -> EE` as measure of     `feed` minus suffix     `eed` is `m(f)`, hence     `=0`

`feed` will pass through     `(*v*) ED  ->`, as there is a vowel in the stem     `fe` once the suffix     `ed` is removed. So will stem at this point to     `fe`

有人可以向我解释一下在线 Porter Stemmers 是如何设法提取feed 的吗？

谢谢。

最佳答案

这是因为“feed”没有 VC(元音/辅音)组合，因此 m = 0。要删除“ed”后缀，m > 0(检查每个步骤的条件)。

关于algorithm - 波特词干分析器，步骤 1b，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/36225293/

上一篇：c++ - 将无状态 C++ 函数从里到外转换为有状态函数

下一篇：c++ - 为大型 3D 网格的非顺序访问高效执行文件 I/O

相关文章：

arrays - 查找数组中指定元素的第一次出现

python - 属性错误: module 'jaxlib.xla_extension' has no attribute 'PmapFunction'

python - 如何创建一个对单词进行标记和词干处理的函数

algorithm - 最小割边最少的算法

algorithm - 在没有任何负前缀的图中找到最短路径

c - 如何将 Prim 算法转化为 Kruskal 算法？

NLTK 中的 Python 索引命令

python-3.x - Gensim build_vocab 耗时过长

r - R 中的 Snowball 和 Snowball 包是否不同？

用于阿拉伯语文本的 Python ISRIStemmer