我试图让Elastic Search在城市列表中进行语音搜索。我的目标是即使用户使用了不正确的拼写,也要找到匹配的结果。
我已完成以下步骤:
curl -X DELETE "localhost:9200/city/"
curl -X PUT "localhost:9200/city/?pretty" -H 'Content-Type: application/json' -d'
{
"settings": {
"index": {
"analysis": {
"analyzer": {
"my_analyzer": {
"tokenizer": "standard",
"filter": [
"lowercase",
"my_metaphone"
]
}
},
"filter": {
"my_metaphone": {
"type": "phonetic",
"encoder": "metaphone",
"replace": true
}
}
}
}
},
"mappings": {
"properties": {
"name": {
"type": "text",
"analyzer": "my_analyzer"
}
}
}
}'
curl -X PUT "localhost:9200/city/_doc/1?pretty" -H 'Content-Type: application/json' -d'
{
"name":"Mayrhofen"
}
'
curl -X PUT "localhost:9200/city/_doc/2?pretty" -H 'Content-Type: application/json' -d'
{
"name":"Ischgl"
}
'
curl -X PUT "localhost:9200/city/_doc/3?pretty" -H 'Content-Type: application/json' -d'
{
"name":"Saalbach"
}
'
curl -X GET ""localhost:9200/city/_search?pretty" -H 'Content-Type: application/json' -d'
{
"query":{
"query_string":{
"query":"Mayrhofen"
}
}
}
'
我尝试使用 Mayerhofen 进行查询,并期望得到与使用 Mayrhofen 相同的结果。与 Ischgl 和 Ichgl 或 Saalbach 和 Salbach 相同的问题。
我的错误在哪里?有事吗?
最佳答案
问题是您使用了错误的encoder
。 metaphone
与之不符。
您需要使用double_metaphone
作为输入。它基于语音算法的实现。我建议您了解您的数据和算法,以确保语音算法是否最适合您的目的。
对应:
{
"analysis": {
"analyzer": {
"double_meta_true_analyzer": {
"tokenizer": "standard",
"filter": [
"lowercase",
"true_doublemetaphone"
]
}
},
"filter": {
"true_doublemetaphone": {
"type": "phonetic",
"encoder": "double_metaphone",
"replace": true
}
}
}
}
它与文档匹配。为什么metaphone不匹配:
GET http://localhost:9200/city2/_analyze
{
"field":"meta_true",
"text":"Mayrhofen"
}
产量{
"tokens": [
{
"token": "MRHF",
"start_offset": 0,
"end_offset": 9,
"type": "<ALPHANUM>",
"position": 0
}
]
}
并在下面分析{
"field":"meta_true",
"text":"Mayerhofen"
}
产量{
"tokens": [
{
"token": "MYRH",
"start_offset": 0,
"end_offset": 10,
"type": "<ALPHANUM>",
"position": 0
}
]
}
Double_Metaphone通过以下方式工作:
GET
{
"field":"doublemeta_true",
"text":"Mayerhofen"
}
和{
"field":"doublemeta_true",
"text":"Mayerhofen"
}
和{
"field":"doublemeta_true",
"text":"Mayrhofen"
}
产量{
"tokens": [
{
"token": "MRFN",
"start_offset": 0,
"end_offset": 10,
"type": "<ALPHANUM>",
"position": 0
}
]
}
关于amazon-web-services - Elasticsearch 与语音搜索,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/62883549/