elasticsearch - Elasticsearch语言分析器-文本分析后返回检索到的字段

我正在Elasticsearch中使用全文搜索引擎，并在索引时间内使用多语言数据。我使用了Elasticsearch进行文本分析，并且我希望能够在预处理后返回 token (检索索引)。我了解Analyze API，但是要为+200.000个文档执行此操作非常耗时。我发现“术语聚合”，但是我不确定它是如何工作的。有任何想法吗？

我在映射语言分析器中使用过。使用语言分析器时是否有开箱即用的语言检测功能，或者每个语言分析器传递的每个文档都没有？如果是这样，使用语言检测并为每种语言创建多字段是否有意义？在设置或映射中使用语言分析器有什么区别？

PUT /index_sample
{
  "settings": {
    "analysis" : {
      "analyzer" : {
        "my_analyzer" : {
          "type" : "custom",
          "tokenizer" : "standard",
          "filter" : [
            "my_asciifolding",
            "my_apostrophe",
            "cjk_bigram"]
        }
      },
      "filter" : {
        "my_asciifolding" : {
          "type" : "asciifolding",
          "preserve_original" : true
        },
        "my_apostrophe" :{
        "type" : "apostrophe"
        }
      }
    }
  },
  "mappings" : {
    "properties": {
      "category_number" : {
        "type" : "integer",
        "fields" : {
          "raw" : {
          "type" : "keyword"
          }
        }
      },
      "product": {
        "type" : "text",
        "index" : "true",
        "store" : "true",
        "analyzer" : "my_analyzer",
        "fields" : {
          "german_field": {
            "type" : "text",
            "analyzer": "german"
          },
          "english_field" : {
            "type" : "text",
            "analyzer" : "english"
          },
          "chinese_field" : {
            "type" : "text",
            "analyzer" : "smartcn"
          },
          "spanish_field": {
            "type" : "text",
            "analyzer" : "spanish"
          },
          "czech_analyer" : {
            "type" : "text",
            "analyzer" : "czech"
          },
          "french_field": {
            "type" : "text",
            "analyzer" : "french"
          },
          "italian_field" : {
            "type" : "text",
            "analyzer" : "italian"
          },
          "dutch_field": {
            "type" : "text",
            "analyzer" : "dutch"
          },
          "portuguese_field": {
            "type" : "text",
            "analyzer" : "portuguese"
          }
        }  
      }
    }
  }
}

最佳答案

如果您想查看索引字段的外观
您可以使用_analyse API(我相信您不想这么做)
或者你可以看看_termvectors

GET /<index_name>/_termvectors/<doc_id>?fields=<filed_name>

关于elasticsearch - Elasticsearch语言分析器-文本分析后返回检索到的字段，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/62638814/

elasticsearch - Elasticsearch语言分析器-文本分析后返回检索到的字段

上一篇：powershell - 将Format-Hex PowerShell表转换为原始十六进制转储

下一篇：powershell - 在System.DateTime对象上操纵时间