c++ - 关于波特词干算法的困惑

我正在尝试实现波特词干算法，但我在这一点上绊倒了

where the square brackets denote arbitrary presence of their contents. Using (VC){m} to denote VC repeated m times, this may again be written as
[C](VC){m}[V].
m will be called the \measure\ of any word or word part when represented in this form. The case m = 0 covers the null word. Here are some examples:
m=0    TR,  EE,  TREE,  Y,  BY.
m=1    TROUBLE,  OATS,  TREES,  IVY.
m=2    TROUBLES,  PRIVATE,  OATEN,  ORRERY.

我不明白这个“措施”是什么以及它代表什么？

最佳答案

看起来度量是元音后面紧跟着辅音的次数。例如，

“麻烦”有:

可选的初始辅音[C] =“TR”。

第一个元音-辅音组(VC) =“OUBL”。

第二元音-辅音组(VC) = "ES"。

可选的结尾元音[V]为空。

因此测量值是二，即(VC)“匹配”的次数。

关于c++ - 关于波特词干算法的困惑，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/4520706/