elasticsearch - 如何使用 Elasticsearch 突出显示单词中的 ngram 标记

标签 elasticsearch highlight

我只想突出显示匹配的 ngram,而不是整个单词。 示例:

term: "Wo"
highlight should be: "<em>Wo</em>nderfull world!"
currently it is: "<em>Wonderfull</em> world!"

映射是:

{
  "global_search_1495732922733" : {
    "mappings" : {
      "meeting" : {
        "properties" : {
        ...
          "name" : {
            "type" : "text",
            "analyzer" : "meeteor_index_analyzer",
            "search_analyzer" : "meeteor_search_term_analyzer"
          },
          ...
        }
      }
    }
  }
}

分析器是:

"analysis" : {
  "filter" : {
    "meeteor_stemmer" : {
      "name" : "english",
      "type" : "stemmer"
    },
    "meeteor_ngram" : {
      "type" : "nGram",
      "min_gram" : "2",
      "max_gram" : "15"
    }
  },
  "analyzer" : {
    "meeteor_search_term_analyzer" : {
      "filter" : [
        "lowercase",
        "asciifolding"
      ],
      "tokenizer" : "standard"
    },
    "meeteor_index_analyzer" : {
      "filter" : [
        "lowercase",
        "asciifolding",
        "meeteor_ngram"
      ],
      "tokenizer" : "standard"
    },
    "meeteor_project_id_analyzer" : {
      "tokenizer" : "standard"
    }
  }
},

具体例子:

curl -XGET 'localhost:9200/global_search/meeting/_search?pretty' -H 'Content-Type: application/json' -d'
{
    "query": {
        "match": {
            "name": "Me"
        }
    },
    "highlight":{
      "fields": {
        "name": {}
      }
    }
}
'

结果是:

 "...highlight" : {
          "name" : [
            "Sad <em>Meeting</em>"
          ]
        }

最佳答案

实现您想要的目标的正确方法是使用 ngram 作为 tokenizer 而不是 filter。你可以这样做:

"analysis" : {
  "filter" : {
    "meeteor_stemmer" : {
      "name" : "english",
      "type" : "stemmer"
    }
  },
  "tokenizer" : {
    "meeteor_ngram_tokenizer" : {
      "type" : "nGram",
      "min_gram" : "2",
      "max_gram" : "15"
    }
  },
  "analyzer" : {
    "meeteor_search_term_analyzer" : {
      "filter" : [
        "lowercase",
        "asciifolding"
      ],
      "tokenizer" : "standard"
    },
    "meeteor_index_analyzer" : {
      "filter" : [
        "lowercase",
        "asciifolding"
      ],
      "tokenizer" : "meeteor_ngram_tokenizer"
    },
    "meeteor_project_id_analyzer" : {
      "tokenizer" : "standard"
    }
  }
},

它将像这样通过 ngram 为您生成突出显示:

 "...highlight" : {
          "name" : [
            "Sad <em>Me</em>eting"
          ]
        }

关于elasticsearch - 如何使用 Elasticsearch 突出显示单词中的 ngram 标记,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/44205267/

相关文章:

database - 带有数据 : shared volume vs clustering vs single instance 的 Docker Swarm

r - 使用 openxlsx 按单元格填充颜色过滤 Excel 中突出显示的数据

elasticsearch - Logstash-Elesticsearch-Kibana:通过tcp-input存储日志并将其显示在Kibana中

c# - ElasticSearch C# Nest 使用 5.1 获取热门词

elasticsearch - Elastic Search 中带有附加条件的 Completion Suggester

PHP搜索文本高亮功能

android - 按钮setBackgroundColor无高亮效果

elasticsearch - indexing.index_total在elasticsearch的index/_stats端点中是什么意思。

Vim 用 * 高亮一个单词而不移动光标

php - 在 mysql php 搜索中突出显示搜索词