考虑以下非常基本的T-SQL查询:
select * from Users
where FirstName like '%dm0e776467@mail.com%'
or LastName like '%dm0e776467@mail.com%'
or Email like '%dm0e776467@mail.com%'
我该如何在Lucene中写这个?我尝试了以下方法:
{
“查询”:{
“ bool(boolean) ”:{
“应该”: [
{
“通配符”:{
“firstName”:“dm0e776467@mail.com”
}
},
{
“通配符”:{
“lastName”:“dm0e776467@mail.com”
}
},
{
“通配符”:{
“电子邮件”:“dm0e776467@mail.com”
}
}
]
}
}
}
{
“查询”:{
“multi_match”:{
“query”:“dm0e776467@mail.com”,
“字段”:[
“名字”,
“姓”,
“电子邮件”
]
}
}
}
{
“查询”:{
“请求参数”: {
“query”:“” dm0e776467@mail.com“”,
“字段”:[
“名字”,
“姓”,
“电子邮件”
],
“default_operator”:“或”,
“allow_leading_wildcard”:是
}
}
}
在我看来,没有办法强制Elasticsearch强制查询将输入字符串用作一个子字符串?
最佳答案
standard
(默认)分析器将标记此电子邮件,如下所示:
GET _analyze
{
"text": "dm0e776467@mail.com",
"analyzer": "standard"
}
屈服{
"tokens" : [
{
"token" : "dm0e776467",
...
},
{
"token" : "mail.com",
...
}
]
}
这解释了为什么多重匹配可以与任何*mail.com
后缀一起使用,以及通配符失败的原因。我建议根据this answer对映射进行以下修改:
PUT users
{
"settings": {
"analysis": {
"filter": {
"email": {
"type": "pattern_capture",
"preserve_original": true,
"patterns": [
"([^@]+)",
"(\\p{L}+)",
"(\\d+)",
"@(.+)",
"([^-@]+)"
]
}
},
"analyzer": {
"email": {
"tokenizer": "uax_url_email",
"filter": [
"email",
"lowercase",
"unique"
]
}
}
}
},
"mappings": {
"properties": {
"email": {
"type": "text",
"analyzer": "email"
},
"firstName": {
"type": "text",
"fields": {
"as_email": {
"type": "text",
"analyzer": "email"
}
}
},
"lastName": {
"type": "text",
"fields": {
"as_email": {
"type": "text",
"analyzer": "email"
}
}
}
}
}
}
请注意,我已经在.as_email
和first-
字段上使用了lastName
字段-默认情况下,您可能不想强制将它们映射为电子邮件。然后在索引一些样本后:
POST _bulk
{"index":{"_index":"users","_type":"_doc"}}
{"firstName":"abc","lastName":"adm0e776467@mail.coms","email":"dm0e776467@mail.com"}
{"index":{"_index":"users","_type":"_doc"}}
{"firstName":"xyz","lastName":"opr","email":"dm0e776467@mail.com"}
{"index":{"_index":"users","_type":"_doc"}}
{"firstName":"zyx","lastName":"dm0e776467@mail.com","email":"qwe"}
{"index":{"_index":"users","_type":"_doc"}}
{"firstName":"abc","lastName":"efg","email":"ijk"}
通配符工作得很好:GET users/_search
{
"query": {
"bool": {
"should": [
{
"wildcard": {
"email": "dm0e776467@mail.com"
}
},
{
"wildcard": {
"lastName.as_email": "dm0e776467@mail.com"
}
},
{
"wildcard": {
"firstName.as_email": "dm0e776467@mail.com"
}
}
]
}
}
}
请检查此 token 生成器的工作原理,以防止“令人惊讶”的查询结果:GET users/_analyze
{
"text": "dm0e776467@mail.com",
"field": "email"
}
关于tsql - ElasticSearch中的多字段通配符搜索,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/63020741/