我正在尝试理解雪球词干算法。 HW90有一个类似的问题和例子,但不是我的。该算法使用两个区域 R1 和 R2,定义如下:
R1 is the region after the first non-vowel following a vowel, or is the null region at the end of the word if there is no such non-vowel.
R2 is the region after the first non-vowel following a vowel in R1, or is the null region at the end of the word if there is no such non-vowel.
我不明白,“单词末尾的空区域”是什么。有人能给我一些例子吗?
最佳答案
Null Region 表示空白区域,没有字母。您错过了 documentation page 中的示例:
Below, R1 and R2 are shown for a number of English words,
b e a u t i f u l |<------------->| R1 |<----->| R2
Letter t is the first non-vowel following a vowel in beautiful, so R1 is iful. In iful, the letter f is the first non-vowel following a vowel, so R2 is ul.
b e a u t y |<->| R1 ->|<- R2
In beauty, the last letter y is classed as a vowel. Again, letter t is the first non-vowel following a vowel, so R1 is just the last letter, y. R1 contains no non-vowel, so R2 is the null region at the end of the word.
b e a u ->|<- R1 ->|<- R2
关于nlp - 雪球词干 : defining Null Region,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/39355994/