我正在尝试实现波特词干算法,但我在这一点上绊倒了
where the square brackets denote arbitrary presence of their contents. Using (VC){m} to denote VC repeated m times, this may again be written as
[C](VC){m}[V].
m will be called the \measure\ of any word or word part when represented in this form. The case m = 0 covers the null word. Here are some examples:
m=0 TR, EE, TREE, Y, BY. m=1 TROUBLE, OATS, TREES, IVY. m=2 TROUBLES, PRIVATE, OATEN, ORRERY.
我不明白这个“措施”是什么以及它代表什么?
最佳答案
看起来度量是元音后面紧跟着辅音的次数。例如,
“麻烦”有:
可选的初始辅音[C]
=“TR”。
第一个元音-辅音组(VC)
=“OUBL”。
第二元音-辅音组(VC)
= "ES"。
可选的结尾元音[V]
为空。
因此测量值是二,即(VC)
“匹配”的次数。
关于c++ - 关于波特词干算法的困惑,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/4520706/