elasticsearch - 如何在像 “i phone”这样的Elasticsearch中匹配不匹配的单词

标签 elasticsearch inverted-index

我用字段“名称”创建了两个索引fashion和mobiles。

client.indices.create(index='fashion',body={"mappings": {"doc": {"properties": {"name": {"type": "string"} } } } })
client.indices.create(index='mobiles',body={"mappings": {"doc": {"properties": {"name": {"type": "string"} } } } })

对于Fashion,添加了以下文档。
client.index(index='mobiles',doc_type='blog',body={"query":{ "name": "i shirts" }})
client.index(index='mobiles',doc_type='blog',body={"query":{ "name": "i celekon" }})
client.index(index='mobiles',doc_type='blog',body={"query":{ "name": "satsung" }})

对于手机:
client.index(index='mobiles',doc_type='blog',body={"query":{ "name": "apple iphone 6s" }})
client.index(index='mobiles',doc_type='blog',body={"query":{ "name": "samsung galaxy s2" }})
client.index(index='mobiles',doc_type='blog',body={"query":{ "name": "apple iphone 5s" }})

当我使用匹配查询来搜索类似
search="i phone"
test=client.search(index='mobiles,fashion',doc_type='blog',size=10,body={"query": {"bool" : {"should" : [{"match": {"name": {"query":search,"slop": 10,"max_expansions": 2 }}},{"match_phrase_prefix": {"name": {"query":search,"slop": 10,"max_expansions": 2}}},{"match": {"name": {"query":search, "fuzziness":1}}}]}}})

我按以下顺序获得结果。

i shirts , i celekon , apple iphone 6s , apple iphone 5s



如何追踪结果?

apple iphone 6s , apple iphone 5s, ....



“amazon”,“flipkart”如何实现这些类型的搜索?

注意:我使用elasticsearch-py api进行搜索。

最佳答案

您必须创建一个使用Word Delimiter Token Filter的自定义分析器:

Named word_delimiter, it splits words into subwords and performs optional transformations on subword groups. Words are split into subwords with the following rules:

  1. split on intra-word delimiters (by default, all non alpha-numeric
    characters). "Wi-Fi" → "Wi", "Fi"
  2. split on case transitions: "PowerShot" → "Power", "Shot"
  3. split on letter-number transitions: "SD500" → "SD", "500"
  4. leading and trailing intra-word delimiters on each subword are ignored: "//hello---there, dude" → "hello", "there", "dude"
  5. trailing "'s" are removed for each subword: "O’Neil’s" → "O", "Neil"


我认为您正在寻找第二个例子。如果您要为iPhone编制索引,它将创建 token "i""Phone",这正是您要寻找的。

要记住的一件事是,您应该在此处照看"preserve_original"参数并将其设置为true,因此它确实保留了原始单词。这很重要,因为用户可以同时搜索i Phone和iPhone,并且仍然会得分。

关于elasticsearch - 如何在像 “i phone”这样的Elasticsearch中匹配不匹配的单词,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/33516502/

相关文章:

elasticsearch - Logstash doc_as_upsert在Elasticsearch中交叉索引消除重复

algorithm - 用于查找倒排索引中乱序值数组的交集的良好数据结构?

elasticsearch - 在Logstash中为sql_last_value使用表的ID?

elasticsearch - Elasticsearch 不返回结果,对多值字段进行过滤

elasticsearch - 范围和短语查询在 elasticsearch 中如何工作?

c++ - 倒排索引 : Find a phrase in a set of documents

倒排索引搜索算法

mysql - 手动搜索倒排索引

elasticsearch - 使用 'now'关键字时, Elasticsearch 如何计算当前日期?

elasticsearch - 基巴纳 : Is there a way to get dashboards from command line?