elasticsearch - 如何决定在 Elasticsearch "Phonetic Token filter"中使用哪种编码器用于哪种语言?

标签 elasticsearch phonetics metaphone

我用过 话机 soundex Elasticsearch 中带有“Phonetic Token Filter”的编码器。

话机 对英语单词有好处。

Soundex 对英语和印地语都有好处,也许还有许多其他语言。

我想知道这些编码器中的哪一个最适合印地语以及其他印度语言(如果可能)?

  • Soundex
  • 话机
  • double_metaphone
  • 精炼_soundex
  • caverphone1 - 英语(新西兰本地化)
  • caverphone2 - 英语(新西兰本地化)
  • 古龙水 - 德语
  • nysiis - 即兴 Soundex
  • koelnerphonetik - 德语
  • haasephonetik - 德语
  • beider_morse - 英语和多种欧洲语言
  • daitch_mokotoff - 斯拉夫语和意第绪语姓氏

  • 因为这不在 Elasticsearch website 上列出对于哪种语言,我们应该选择哪种编码器。

    还请告诉我您已经使用了哪些编码器以及用于哪种语言。

    最佳答案

    语音编码器是根据单词的发音索引单词的算法。

    对此的解释可在维基百科上找到

    1. Metaphone, Double Metaphone, and Metaphone 3 : suitable for use with most English words, not just names. Metaphone algorithms are the basis for many popular spell checkers. The Double Metaphone phonetic encoding algorithm is the second generation of this algorithm.
    2. Soundex: which was developed to encode surnames for use in censuses. Soundex codes are four-character strings composed of a single letter followed by three numbers.
    3. Daitch–Mokotoff Soundex: which is a refinement of Soundex designed to better match surnames of Slavic and Germanic origin. Daitch–Mokotoff Soundex codes are strings composed of six numeric digits.
    4. Cologne phonetics :This is similar to Soundex, but more suitable for German words.
    5. New York State Identification and Intelligence System (NYSIIS): which maps similar phonemes to the same letter. The result is a string that can be pronounced by the reader without decoding.
    6. Match Rating Approach developed by Western Airlines in 1977: this algorithm has an encoding and range comparison technique.
    7. Caverphone: created to assist in data matching between late 19th century and early 20th century electoral rolls, optimized for accents present in parts of New Zealand


    引用:
    上述算法及其子类型的详细信息我们可以在下面的维基百科页面中找到
    1. https://en.wikipedia.org/wiki/Phonetic_algorithm

    其中 SoundEx 最适合印度语
    您可以查看以下资源以获得相同的信息
    1. Phonetic search for Indian languages
    2. https://thottingal.in/blog/2009/07/26/indicsoundex/

    关于elasticsearch - 如何决定在 Elasticsearch "Phonetic Token filter"中使用哪种编码器用于哪种语言?,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/60897572/

    相关文章:

    elasticsearch - 使用输入数组进行Elasticsearch过滤,其中

    java - Elasticsearch 索引后执行外部操作

    python - 如何获得文本(不是语音,只有文本)的发音(语音)?

    java - 印度语言的拼音搜索

    search - 搜索用户名的语音算法

    php - 葡萄牙语 (pt_PT) 的本地化(双)变音位

    elasticsearch - 如何从 ElasticSearch 创建两个级别的 Grafana 变量?

    c# - NEST:创建别名并设置过滤器

    mysql - 为什么这个 MySQL 双变音位功能不能正常工作?

    r - MetaPhone Functions(如 SoundEx)功能并在 R 中使用?