elasticsearch - Elasticsearch path_hierarchy标记路径的一半

我正在尝试使用path_hierarchy标记生成器为路径编制索引，但它似乎只是标记化了我提供的路径的一半。我尝试了不同的路径，结果似乎是相同的。

我的设置是-

{
    "settings" : { 
        "number_of_shards" : 5,
        "number_of_replicas" : 0,
        "analysis":{
            "analyzer":{
                "keylower":{
                    "type": "custom",
                    "tokenizer":"keyword",
                    "filter":"lowercase"
                },
                "path_analyzer": {
                    "type": "custom",
                    "tokenizer": "path_tokenizer",
                    "filter": [ "lowercase", "asciifolding", "path_ngrams" ]
                },
                "code_analyzer": {
                    "type": "custom",
                    "tokenizer": "standard",
                    "filter": [ "lowercase", "asciifolding", "code_stemmer" ]
                },
                "not_analyzed": {
                    "type": "custom",
                    "tokenizer": "keyword",
                    "filter": [ "lowercase", "asciifolding", "code_stemmer" ]
                }
            },
            "tokenizer": {
                "path_tokenizer": {
                  "type": "path_hierarchy"
                }
            },
            "filter": {
                "path_ngrams": {
                    "type": "edgeNGram",
                    "min_gram": 3,
                    "max_gram": 15
                },
                "code_stemmer": {
                    "type": "stemmer",
                    "name": "minimal_english"
                }
            }
        }
    }
}

我的映射如下-

{
  "dynamic": "strict",
  "properties": {
    "depot_path": {
      "type": "string",
      "analyzer": "path_analyzer"
    }
  },
  "_all": {
      "store": "yes",
      "analyzer": "english"
  }
}

我在分析时提供了"//cm/mirror/v1.2/Kolkata/ixin-packages/builds/"作为depot_path，我发现 token 形成如下:

               "key": "//c",
               "key": "//cm",
               "key": "//cm/",
               "key": "//cm/m",
               "key": "//cm/mi",
               "key": "//cm/mir",
               "key": "//cm/mirr",
               "key": "//cm/mirro",
               "key": "//cm/mirror",
               "key": "//cm/mirror/",
               "key": "//cm/mirror/v",
               "key": "//cm/mirror/v1",
               "key": "//cm/mirror/v1.",

为什么整个路径都没有标记？

我的预期输出是使 token 一直形成到//cm/mirror/v1.2/Kolkata/ixin-packages/builds/
我尝试增加缓冲区大小，但是没有运气。有人知道我做错了什么吗？

https://www.elastic.co/guide/en/elasticsearch/reference/current/analysis-pathhierarchy-tokenizer.html，

最佳答案

"max_gram": 15将 token 大小限制为15。如果增加"max_gram"，您将看到进一步的路径将被 token 化。

以下是我的环境中的示例。

"max_gram" :15 
input path : /var/log/www/html/web/
path_analyser tokenized this path upto /var/log/www/ht i.e. 15 characters

 "max_gram" :100
    input path : /var/log/www/html/web/WANTED
    path_analyser tokenized this path upto /var/log/www/html/web/WANTED i.e. 28  characters <100

关于elasticsearch - Elasticsearch path_hierarchy标记路径的一半，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/35359579/

elasticsearch - Elasticsearch path_hierarchy标记路径的一半

上一篇：elasticsearch - 浏览所有文档并批量更新其中一些

下一篇：audio - 将.AU文件转换为.OGG文件