java - 在elasticsearch中配置分析器

标签 java elasticsearch lucene

我编写了下面的程序来了解如何使用 Elasticsearch 来进行全文搜索。在这里,当我搜索单个单词时,它可以正常工作,但我想搜索单词的组合,但这是行不通的。

package in.blogspot.randomcompiler.elastic_search_demo;

import in.blogspot.randomcompiler.elastic_search_impl.Event;

import java.util.Date;

import org.elasticsearch.action.count.CountRequestBuilder;
import org.elasticsearch.action.count.CountResponse;
import org.elasticsearch.action.delete.DeleteResponse;
import org.elasticsearch.action.index.IndexResponse;
import org.elasticsearch.action.search.SearchRequestBuilder;
import org.elasticsearch.action.search.SearchResponse;
import org.elasticsearch.client.Client;
import org.elasticsearch.client.transport.TransportClient;
import org.elasticsearch.common.transport.InetSocketTransportAddress;
import org.elasticsearch.index.query.FilterBuilder;
import org.elasticsearch.index.query.FilterBuilders;
import org.elasticsearch.index.query.QueryBuilders;
import org.elasticsearch.search.SearchHit;
import org.elasticsearch.search.SearchHits;

import com.fasterxml.jackson.core.JsonProcessingException;

public class ElasticSearchDemo
{
    public static void main( String[] args ) throws JsonProcessingException
    {
        Client client = new TransportClient()
        .addTransportAddress(new InetSocketTransportAddress("localhost", 9301));

        DeleteResponse deleteResponse1 = client.prepareDelete("chat-data", "event", "1").execute().actionGet();
        DeleteResponse deleteResponse2 = client.prepareDelete("chat-data", "event", "2").execute().actionGet();
        DeleteResponse deleteResponse3 = client.prepareDelete("chat-data", "event", "3").execute().actionGet();

        Event e1 = new Event("LOGIN", new Date(), "Agent1 logged into chat");
        String e1Json = e1.prepareJson();        
        System.out.println("JSON: " + e1Json);        
        IndexResponse indexResponse1 = client.prepareIndex("chat-data", "event", "1").setSource(e1Json).execute().actionGet();
        printIndexResponse("e1", indexResponse1);

        Event e2 = new Event("LOGOUT", new Date(), "Agent1 logged out of chat");
        String e2Json = e2.prepareJson();        
        System.out.println("JSON: " + e2Json);        
        IndexResponse indexResponse2 = client.prepareIndex("chat-data", "event", "2").setSource(e2Json).execute().actionGet();
        printIndexResponse("e2", indexResponse2);

        Event e3 = new Event("BREAK", new Date(), "Agent1 went on break in the middle of a chat");
        String e3Json = e3.prepareJson();        
        System.out.println("JSON: " + e3Json);        
        IndexResponse indexResponse3 = client.prepareIndex("chat-data", "event", "3").setSource(e3Json).execute().actionGet();
        printIndexResponse("e3", indexResponse3);

        FilterBuilder filterBuilder = FilterBuilders.termFilter("value", "break middle");

        SearchRequestBuilder searchBuilder = client.prepareSearch();
        searchBuilder.setPostFilter(filterBuilder);

        CountRequestBuilder countBuilder = client.prepareCount();
        countBuilder.setQuery(QueryBuilders.constantScoreQuery(filterBuilder));

        CountResponse countResponse1 = countBuilder.execute().actionGet();
        System.out.println("HITS: " + countResponse1.getCount());


        SearchResponse searchResponse1 = searchBuilder.execute().actionGet();
        SearchHits hits = searchResponse1.getHits();
        for(int i=0; i<hits.hits().length; i++) {
            SearchHit hit = hits.getAt(i);
            System.out.println("[" + i + "] " + hit.getId() + " : " +hit.sourceAsString());
        }

        client.close();
    }

    private static void printIndexResponse(String description, IndexResponse response) {
        System.out.println("Index response for: " + description);
        System.out.println("Index name: " + response.getIndex());
        System.out.println("Index type: " + response.getType());
        System.out.println("Index id: " + response.getId());
        System.out.println("Index version: " + response.getVersion());
    }
}

我面临的问题是,当我搜索“break middle”时,它什么也没有返回,期望它应该返回第三个事件。

我知道我需要配置一个不同的分析器而不是默认的分析器才能使其正确索引。

有人可以帮助我理解如何做到这一点吗?如果有一些完整的例子就太好了。

最佳答案

该问题是由于您使用 Term 引起的过滤器:

FilterBuilder filterBuilder = FilterBuilders.termFilter("value", "break middle");

术语过滤器不会分析查询字符串中的数据 - 因此 Elasticsearch 正在寻找确切的字符串“break middle”。

但是第三个文档可能已被 ES 分解为单独的术语,如下所示:

Agent1 
went 
on 
break 
in 
the 
middle 
of 
a 
chat

要解决此问题,请使用过滤器或查询来分析您传递的字符串 - 例如使用 Query_String查询或Match查询。

例如:

QueryBuilder qb = QueryBuilders.matchQuery("event", "break middle");

或者:

QueryBuilder qb = QueryBuilders.queryString("break middle");

请参阅Java API documentation for Elasticsearch了解更多信息。

关于java - 在elasticsearch中配置分析器,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/28091579/

相关文章:

elasticsearch - logstash不创建索引

lucene.net - 最好使用 lucene KeywordAnalyzer 为自动建议文本框索引文本吗?

lucene - ElasticSearch:根据字段长度过滤文档

java - ConcurrentHashMap 中的故障安全迭代器

java - 我正在尝试对处理中的图像进行马赛克效果,但图像保持不变?

java - 正则表达式匹配 7 位数字后跟 ","

java - Lucene 能否从单个索引文件返回多个搜索结果?

javascript - 如何使用脚本将页面上已有的选定信息获取到页面的另一部分

elasticsearch - NEST弹性查询工作几个小时,然后停止

elasticsearch - 搜索一个值中包含的 ElasticSearch 字段