ruby-on-rails - Elasticsearch Ngram Analyzer搜索零件Mac地址

标签 ruby-on-rails elasticsearch n-gram

使用ElasticSearch(和Rails),我尝试使用连字符作为分隔符,对包含mac地址的字段建立索引并执行搜索查询,但未成功:

24-A4-3C-02-37-26



搜索整个mac地址(未索引)时一切都很好,但使用自定义分析器无法正常工作。

我测试了许多选项,包括调整最小/最大val均未成功。

通过下面的映射,设置和查询,我得到以下结果:
Box.search(q: "24-A4-3C-02-37-26").results.map(&:macaddress)

产生一个奇怪的结果:
["24-A4-3C-02-37-xx", "DC-9F-DB-F6-B2-xx", "C4-10-8A-13-53-xx", "C4-10-8A-13-54-xx", "C4-10-8A-13-52-xx"]

如果我运行时删除了最后一个八位位组(“24-A4-3C-02-37”),则会得到以下信息:
["DC-9F-DB-F6-B2-xx", "C4-10-8A-13-53-xx", "C4-10-8A-13-52-xx"]

错了

我已经使用API​​检查了分析器,它看起来只是膨胀了:
curl "localhost:9205/boxes/_analyze?analyzer=ngram_analyzer&pretty=true" -d "24-A4-3C-02-37-26"

产生:
{
  "tokens" : [ {
    "token" : "24",
    "start_offset" : 0,
    "end_offset" : 2,
    "type" : "word",
    "position" : 1
  }, {
    "token" : "24-",
    "start_offset" : 0,
    "end_offset" : 3,
    "type" : "word",
    "position" : 2
  }, {
    "token" : "24-A",
    "start_offset" : 0,
    "end_offset" : 4,
    "type" : "word",
    "position" : 3
  }, {
  .........

因此,我只能猜测实际查询中有一些问题。我什至尝试用ascii或转义符代替连字符。
@search_definition[:query] = {
  multi_match: {
    query: options[:q],
    fields: [
      "macaddress.ngram",
      "macaddress.sortable^5",
        ...

我的设置如下所示:
settings analysis: {
  analyzer: {
    ngram_analyzer: {
      type: 'custom',
      tokenizer: 'my_tokenizer',
    }
  },
  tokenizer: {
    my_tokenizer: {
      type: "edgeNGram",
      min_gram: 2,
      max_gram: 17,
      # token_chars: [ "letter", "digit" ]
    }
  }
} do

  mapping do
    indexes :macaddress, type: 'multi_field', fields: {
      raw: { type: "string" },
      sortable: { type: "string", index: "not_analyzed" },
      ngram: { type: "string", index_analyzer: :ngram_analyzer } #, search_analyzer: 'keyword' }
    }
    end
end

有人可以建议我如何使它工作吗?

最佳答案

我已验证以下设置:

PUT test
    {
        "settings" : {
            "analysis" : {
                "analyzer" : {
                    "ngram_analyzer" : {
                        "type": "custom",
                        "tokenizer" : "my_tokenizer"
                    }
                },
                "tokenizer" : {
                    "my_tokenizer" : {
                        "type" : "edgeNGram",
                        "min_gram" : "2",
                        "max_gram" : "17"
                    }
                }
            }
        },
        "mappings": {
          "boxes":{
            "properties": {
              "macaddress":{
                "type": "multi_field",
                "fields": {
                  "raw":{
                    "type": "string"
                  },
                  "sortable":{
                    "type": "string",
                    "index": "not_analyzed"
                  },
                  "ngram":{
                    "type": "string",
                    "index_analyzer": "ngram_analyzer"
                  }
                }
              }
            }
          }
        }
    }

以及一些示例数据:
PUT test/boxes/1
{
  "macaddress":"24-A4-3C-02-37-26"
}
PUT test/boxes/2
{
  "macaddress":"24-A4-3C-02-37-54"
}
PUT test/boxes/3
{
  "macaddress":"24-A4-3C-02-38-23"
}
PUT test/boxes/4
{
"macaddress":"34-A4-3C-02-38-23"
}

和搜索查询:
GET test/boxes/_search
{
  "query": {
    "multi_match": {
      "query": "24-A4-3C-02",
      "fields": ["macaddress.ngram",
      "macaddress.sortable^5"]
    }
  }
}

结果是:
{
   "took": 3,
   "timed_out": false,
   "_shards": {
      "total": 5,
      "successful": 5,
      "failed": 0
   },
   "hits": {
      "total": 3,
      "max_score": 0.047079325,
      "hits": [
         {
            "_index": "test",
            "_type": "boxes",
            "_id": "1",
            "_score": 0.047079325,
            "_source": {
               "macaddress": "24-A4-3C-02-37-26"
            }
         },
         {
            "_index": "test",
            "_type": "boxes",
            "_id": "2",
            "_score": 0.047079325,
            "_source": {
               "macaddress": "24-A4-3C-02-37-54"
            }
         },
         {
            "_index": "test",
            "_type": "boxes",
            "_id": "3",
            "_score": 0.047079325,
            "_source": {
               "macaddress": "24-A4-3C-02-38-23"
            }
         }
      ]
   }
}

关于ruby-on-rails - Elasticsearch Ngram Analyzer搜索零件Mac地址,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/29521371/

相关文章:

ruby-on-rails - 支持的数据库列表?

c# - C#嵌套Elasticsearch地理点阵列索引未在Kibana中显示

lucene - 使用 ElasticSearch 搜索文件名

ruby-on-rails - 如何使用 Ubuntu 启动延迟作业?

ruby-on-rails - 有没有办法告诉 Capistrano 将本地存储库部署到远程服务器?

ruby-on-rails - output_safety.rb :34 warning: regexp match/. ../n 对 UTF-8 字符串

java - Elasticsearch 分析

elasticsearch - 在elasticsearch 7.3.2中的fs中创建快照时出错?

algorithm - "Anagram solver"基于统计数据而不是字典/表?

python - 如何使用 NLTK 替换二元组?