我正在使用以下内容进行搜索。它工作正常。但是当找到完整的单词匹配时它会返回结果。但我想要部分查询的结果(至少 3 个字符匹配不完整的单词)。另一个检查应该是,我有一个字段campus
在我的文档中。其中具有类似 campus: "Bradford"
的值, campus:"Oxford"
, campus:"Harvard"
等等。我希望我的查询应该返回其 campus
的文档。应该是 Bradford or Oxford
和 Nel
将在整个文档的其余部分提供。
RestHighLevelClient client;
QueryBuilder matchQueryBuilder = QueryBuilders.queryStringQuery("Nel");
SearchSourceBuilder sourceBuilder = new SearchSourceBuilder();
sourceBuilder.query(matchQueryBuilder);
SearchRequest searchRequest = new SearchRequest("index_name");
searchRequest.source(sourceBuilder);
SearchResponse searchResponse = client.search(searchRequest, RequestOptions.DEFAULT);
如果我们用 SQL 语句映射,就像我们使用的 where campus='Bradford' OR campus='Oxford'
.在文件中,我有“纳尔逊曼德拉二世”
目前,如果我写
Nelson
,它正在工作作为查询,但我需要它来处理查询 Nel
.
最佳答案
基本上有两种可能的方法来实现您正在寻找的用例。
解决方案 1:使用通配符查询
假设你有两个字段
name
类型 text
campus
类型 text
下面是你的java代码的样子:
private static void wildcardQuery(RestHighLevelClient client, SearchSourceBuilder sourceBuilder)
throws IOException {
System.out.println("-----------------------------------------------------");
System.out.println("Wildcard Query");
MatchQueryBuilder campusClause_1 = QueryBuilders.matchQuery("campus", "oxford");
MatchQueryBuilder campusClause_2 = QueryBuilders.matchQuery("campus", "bradford");
//Using wildcard query
WildcardQueryBuilder nameClause = QueryBuilders.wildcardQuery("name", "nel*");
//Main Query
BoolQueryBuilder query = QueryBuilders.boolQuery()
.must(nameClause)
.should(campusClause_1)
.should(campusClause_2)
.minimumShouldMatch(1);
sourceBuilder.query(query);
SearchRequest searchRequest = new SearchRequest();
//specify your index name in the below parameter
searchRequest.indices("my_wildcard_index");
searchRequest.source(sourceBuilder);
SearchResponse searchResponse = client.search(searchRequest, RequestOptions.DEFAULT);
System.out.println(searchResponse.getHits().getTotalHits());
System.out.println("-----------------------------------------------------");
}
请注意,如果上述字段是 keyword
类型并且您需要完全匹配区分大小写,您需要以下代码:TermQueryBuilder campusClause_2 = QueryBuilders.termQuery("campus", "Bradford");
解决方案 2. 使用 Edge Ngram 标记器(首选解决方案)为此,您需要使用 Edge Ngram标记器。
以下是您的映射方式:
映射:
PUT my_index
{
"settings": {
"analysis": {
"analyzer": {
"my_analyzer": {
"filter": "lowercase",
"tokenizer": "my_tokenizer"
}
},
"tokenizer": {
"my_tokenizer": {
"type": "edge_ngram",
"min_gram": 2,
"max_gram": 10,
"token_chars": [
"letter",
"digit"
]
}
}
}
},
"mappings": {
"properties": {
"name":{
"type": "text",
"analyzer": "my_analyzer"
},
"campus": {
"type": "text"
}
}
}
}
示例文件:PUT my_index/_doc/1
{
"name": "Nelson Mandela",
"campus": "Bradford"
}
PUT my_index/_doc/2
{
"name": "Nel Chaz",
"campus": "Oxford"
}
查询DSLPOST my_index/_search
{
"query": {
"bool": {
"must": [
{
"match": {
"name": "nel"
}
}
],
"should": [
{
"match": {
"campus": "bradford"
}
},
{
"match": {
"campus": "oxford"
}
}
],
"minimum_should_match": 1
}
}
}
Java代码:private static void boolMatchQuery(RestHighLevelClient client, SearchSourceBuilder sourceBuilder)
throws IOException {
System.out.println("-----------------------------------------------------");
System.out.println("Bool Query");
MatchQueryBuilder campusClause_1 = QueryBuilders.matchQuery("campus", "oxford");
MatchQueryBuilder campusClause_2 = QueryBuilders.matchQuery("campus", "bradford");
//Plain old match query would suffice here
MatchQueryBuilder nameClause = QueryBuilders.matchQuery("name", "nel");
BoolQueryBuilder query = QueryBuilders.boolQuery()
.must(nameClause)
.should(campusClause_1)
.should(campusClause_2)
.minimumShouldMatch(1);
sourceBuilder.query(query);
SearchRequest searchRequest = new SearchRequest();
searchRequest.indices("my_index");
searchRequest.source(sourceBuilder);
SearchResponse searchResponse = client.search(searchRequest, RequestOptions.DEFAULT);
System.out.println(searchResponse.getHits().getTotalHits());
}
请注意我是如何对 name 字段使用匹配查询的。我建议你阅读一下 analysis , analyzer , tokenizer和 edge-ngram tokenizers关于。在控制台中,您应该能够看到文档的总点击量。
同样,您也可以使用其他查询类型,例如
Term query
在上述解决方案中,如果您正在寻找 keyword
的完全匹配项场等更新答案:
个人不推荐
Solution 1
因为单个字段本身会浪费大量的计算能力,更不用说多个字段了。为了进行多字段子字符串匹配,最好的方法是利用名为
copy-to
的概念。然后对该字段使用 Edge N-Gram 分词器。那么这个 Edge N-Gram 分词器到底有什么作用呢?简单来说,基于
min-gram
和 max-gram
它只会分解您的 token ,例如齐柏林飞艇进
Zep, Zepp, Zeppe, Zeppel, Zeppeli, Zeppelin
从而将这些值插入到该字段的倒排索引中。如果你只是执行一个非常简单的 match
查询,它将返回该文档,因为您的倒排索引将具有该子字符串。关于 copy_to field :
The
copy_to
parameter allows you to copy the values of multiple fields into a group field, which can then be queried as a single field.
使用 copy_to 字段,我们有以下两个字段的映射
campus
和 name
.映射:
PUT my_index
{
"settings": {
"analysis": {
"analyzer": {
"my_analyzer": {
"filter": "lowercase",
"tokenizer": "my_tokenizer"
}
},
"tokenizer": {
"my_tokenizer": {
"type": "edge_ngram",
"min_gram": 3,
"max_gram": 10,
"token_chars": [
"letter",
"digit"
]
}
}
}
},
"mappings": {
"properties": {
"name":{
"type": "text",
"copy_to": "search_string" <---- Note this
},
"campus": {
"type": "text",
"copy_to": "search_string" <---- Note this
},
"search_string": {
"type": "text",
"analyzer": "my_analyzer" <---- Note this
}
}
}
}
请注意,在上面的映射中,我是如何仅将 Edge N-gram 特定分析器用于 search_string
.请注意,这会消耗磁盘空间,因此您可能需要退后一步并确保您不将此分析器用于所有字段,但这同样取决于您拥有的用例。示例文档:
POST my_index/_doc/1
{
"campus": "Cambridge University",
"name": "Ramanujan"
}
查询查询:POST my_index/_search
{
"query": {
"match": {
"search_string": "ram"
}
}
}
这将为您提供如下简单的 Java 代码:private static void boolMatchQuery(RestHighLevelClient client, SearchSourceBuilder sourceBuilder)
throws IOException {
System.out.println("-----------------------------------------------------");
System.out.println("Bool Query");
MatchQueryBuilder searchClause = QueryBuilders.matchQuery("search_string", "ram");
//Feel free to add multiple clauses
BoolQueryBuilder query = QueryBuilders.boolQuery()
.must(searchClause);
sourceBuilder.query(query);
SearchRequest searchRequest = new SearchRequest();
searchRequest.indices("my_index");
searchRequest.source(sourceBuilder);
SearchResponse searchResponse = client.search(searchRequest, RequestOptions.DEFAULT);
System.out.println(searchResponse.getHits().getTotalHits());
}
希望有帮助!
关于java - 使用 Elasticsearch 7 java api 使用 Where 语句进行部分查询,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/62573908/