我用过 话机 和 soundex Elasticsearch 中带有“Phonetic Token Filter”的编码器。
话机 对英语单词有好处。
Soundex 对英语和印地语都有好处,也许还有许多其他语言。
我想知道这些编码器中的哪一个最适合印地语以及其他印度语言(如果可能)?
因为这不在 Elasticsearch website 上列出对于哪种语言,我们应该选择哪种编码器。
还请告诉我您已经使用了哪些编码器以及用于哪种语言。
最佳答案
语音编码器是根据单词的发音索引单词的算法。
对此的解释可在维基百科上找到
- Metaphone, Double Metaphone, and Metaphone 3 : suitable for use with most English words, not just names. Metaphone algorithms are the basis for many popular spell checkers. The Double Metaphone phonetic encoding algorithm is the second generation of this algorithm.
- Soundex: which was developed to encode surnames for use in censuses. Soundex codes are four-character strings composed of a single letter followed by three numbers.
- Daitch–Mokotoff Soundex: which is a refinement of Soundex designed to better match surnames of Slavic and Germanic origin. Daitch–Mokotoff Soundex codes are strings composed of six numeric digits.
- Cologne phonetics :This is similar to Soundex, but more suitable for German words.
- New York State Identification and Intelligence System (NYSIIS): which maps similar phonemes to the same letter. The result is a string that can be pronounced by the reader without decoding.
- Match Rating Approach developed by Western Airlines in 1977: this algorithm has an encoding and range comparison technique.
- Caverphone: created to assist in data matching between late 19th century and early 20th century electoral rolls, optimized for accents present in parts of New Zealand
引用:
上述算法及其子类型的详细信息我们可以在下面的维基百科页面中找到
1. https://en.wikipedia.org/wiki/Phonetic_algorithm
其中 SoundEx 最适合印度语
您可以查看以下资源以获得相同的信息
1. Phonetic search for Indian languages
2. https://thottingal.in/blog/2009/07/26/indicsoundex/
关于elasticsearch - 如何决定在 Elasticsearch "Phonetic Token filter"中使用哪种编码器用于哪种语言?,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/60897572/