elasticsearch - spring data elasticsearch中,聚合查询不能放在repository实现吗?

标签 elasticsearch spring-data-elasticsearch

我是第一次使用 spring-boot-elasticsearch。我现在已经弄清楚如何使用 elastics java api 描述我的串行差异管道查询。正如您将在下面看到的,此查询相当大,它为每个对象返回几个存储桶以及每个存储桶之间的序列差异。我在 Spring Data Repository 中看到的搜索示例似乎都在查询注释中拼出了查询的 json 正文,如下所示:

@Repository
public interface SonarMetricRepository extends ElasticsearchRepository<Article, String> {

    @Query("{\"bool\": {\"must\": {\"match\": {\"authors.name\": \"?0\"}}, \"filter\": {\"term\": {\"tags\": \"?1\" }}}}")
    Page<Article> findByAuthorsNameAndFilteredTagQuery(String name, String tag, Pageable pageable);
}
这对于基本的 CRUD 操作来说似乎很优雅,但是如何在不需要使用 @Query 的原始查询语法的情况下将下面的查询放入存储库对象中?如果您有一个类似的示例,说明为串行差异查询结果或任何管道聚合构建的模型对象也会更有帮助。基本上我想在我的存储库中使用这样的搜索方法
Page<Serial Difference Result Object> getCodeCoverageMetrics(String projectKey, Date start, Date end, String interval, int lag);
我应该提到我想使用这个对象的部分原因是我也会在这里有其他的 CRUD 查询,而且我认为它会为我处理分页,所以这很吸引人。
这是我的查询,它显示了 1 周时间段内声纳项目的代码覆盖率之间的序列差异:
        SerialDiffPipelineAggregationBuilder serialDiffPipelineAggregationBuilder =
            PipelineAggregatorBuilders
                    .diff("Percent_Change", "avg_coverage")
                    .lag(1);

    AvgAggregationBuilder averageCoverageAggregationBuilder = AggregationBuilders
            .avg("avg_coverage")
            .field("coverage");

    AggregationBuilder coverageHistoryAggregationBuilder = AggregationBuilders
            .dateHistogram("coverage_history")
            .field("@timestamp")
            .calendarInterval(DateHistogramInterval.WEEK)
            .subAggregation(averageCoverageAggregationBuilder)
            .subAggregation(serialDiffPipelineAggregationBuilder);

    TermsAggregationBuilder sonarProjectKeyAggregationBuilder = AggregationBuilders
            .terms("project_key")
            .field("key.keyword")
            .subAggregation(coverageHistoryAggregationBuilder);

    BoolQueryBuilder searchQuery = new BoolQueryBuilder()
            .filter(matchAllQuery())
            .filter(matchPhraseQuery("name.keyword", "my-sample-sonar-project"))
            .filter(rangeQuery("@timestamp")
                    .format("strict_date_optional_time")
                    .gte("2020-07-08T19:29:12.054Z")
                    .lte("2020-07-15T19:29:12.055Z"));

    // Join query and aggregation together
    SearchSourceBuilder searchSourceBuilder = new SearchSourceBuilder()
            .query(searchQuery)
            .aggregation(sonarProjectKeyAggregationBuilder);

    SearchRequest searchRequest = new SearchRequest("sonarmetrics").source(searchSourceBuilder);
    SearchResponse response = client.search(searchRequest, RequestOptions.DEFAULT);

最佳答案

好的,如果我做对了,您想将聚合添加到存储库查询中。这对于 Spring Data Elasticsearch 自动创建的方法是不可能的,但实现起来并不难。
为了向您展示如何做到这一点,我使用了一个更简单的示例,我们在其中定义了 Person实体:

@Document(indexName = "person")
public class Person {

    @Id
    @Nullable
    private Long id;

    @Field(type = FieldType.Text, fielddata = true)
    @Nullable
    private String lastName;

    @Field(type = FieldType.Text, fielddata = true)
    @Nullable
    private String firstName;

    // getter/setter
}
还有一个对应的仓库:
public interface PersonRepository extends ElasticsearchRepository<Person, Long>{
}
我们现在想要扩展这个存储库,以便能够搜索有名字的人,并为这些人返回前 10 个姓氏和计数(lastNames 上的术语 aggs)。
首先要做的是定义一个customization repository描述了您需要的方法:
interface PersonCustomRepository {
    SearchPage<Person> findByFirstNameWithLastNameCounts(String firstName, Pageable pageable);
}
我们想传入 Pageable以便这些方法返回数据页。我们返回 SearchPage对象检查 the documentation on return types将包含寻呼信息以及 SearchHits<Person> .然后该对象具有聚合
信息和结果数据。
然后我们更改PersonRepository扩展这个新接口(interface):
public interface PersonRepository extends ElasticsearchRepository<Person, Long>, PersonCustomRepository {
}
当然,我们现在需要在一个名为 PersonCustomRepositoryImpl 的类中提供一个实现。 (这必须像添加了 Impl 的接口(interface)一样命名):
public class PersonCustomRepositoryImpl implements PersonCustomRepository {

    private final ElasticsearchOperations operations;

    public PersonCustomRepositoryImpl(ElasticsearchOperations operations) { // let Spring inject an operations which we use to do the work
        this.operations = operations;
    }

    @Override
    public SearchPage<Person> findByFirstNameWithLastNameCounts(String firstName, Pageable pageable) {

        Query query = new NativeSearchQueryBuilder()                       // we build a Elasticsearch native query
            .addAggregation(terms("lastNames").field("lastName").size(10)) // add the aggregation
            .withQuery(QueryBuilders.matchQuery("firstName", firstName))   // add the query part
            .withPageable(pageable)                                        // add the requested page
            .build();

        SearchHits<Person> searchHits = operations.search(query, Person.class);  // send it of and get the result

        return SearchHitSupport.searchPageFor(searchHits, pageable);  // convert the result to a SearchPage
    }
}
这就是搜索的实现。现在存储库有这个附加方法。如何使用它?
对于这个演示,我假设我们有一个 REST Controller ,它接受一个名称并返回一对:
  • 找到的人作为 SearchHit<Person> 的列表对象
  • 一个 Map<String, Long>包含姓氏及其计数

  • 这可以按如下方式实现,注释描述了所做的事情:
    @GetMapping("persons/firstNameWithLastNameCounts/{firstName}")
    public Pair<List<SearchHit<Person>>, Map<String, Long>> firstNameWithLastNameCounts(@PathVariable("firstName") String firstName) {
    
        // helper function to get the lastName counts from an Elasticsearch Aggregations
        // Spring Data Elasticsearch does not have functions for that, so we need to know what is coming back
        Function<Aggregations, Map<String, Long>> getLastNameCounts = aggregations -> {
            if (aggregations != null) {
                Aggregation lastNames = aggregations.get("lastNames");
                if (lastNames != null) {
                    List<? extends Terms.Bucket> buckets = ((Terms) lastNames).getBuckets();
                    if (buckets != null) {
                        return buckets.stream().collect(Collectors.toMap(Terms.Bucket::getKeyAsString, Terms.Bucket::getDocCount));
                    }
                }
            }
            return Collections.emptyMap();
        };
    
        // the parts of the returned object
        Map<String, Long> lastNameCounts = null;
        List<SearchHit<Person>> searchHits = new ArrayList<>();
    
        // request pages of size 1000
        Pageable pageable = PageRequest.of(0, 1000);
        boolean fetchMore = true;
        while (fetchMore) {
            // call the custom method implementation
            SearchPage<Person> searchPage = personRepository.findByFirstNameWithLastNameCounts(firstName, pageable);
    
            // get the aggregations on the first call, will be the same on the other pages
            if (lastNameCounts == null) {
                Aggregations aggregations = searchPage.getSearchHits().getAggregations();
                lastNameCounts = getLastNameCounts.apply(aggregations);
            }
    
            // collect the returned data
            if (searchPage.hasContent()) {
                searchHits.addAll(searchPage.getContent());
            }
    
            pageable = searchPage.nextPageable();
            fetchMore = searchPage.hasNext();
        }
    
        // return the collected stuff
        return Pair.of(searchHits, lastNameCounts);
    }
    
    我希望这对如何实现自定义存储库功能和添加开箱即用未提供的功能提供一个想法。

    关于elasticsearch - spring data elasticsearch中,聚合查询不能放在repository实现吗?,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/63075178/

    相关文章:

    amazon-web-services - 仅从我的 VPC/子网访问 Elastic Search 互联网终端节点

    elasticsearch - 我应该在elasticsearch的单台机器上使用分片/复制吗?

    将elasticsearch索引 block 标记为null的Java请求

    spring-data-elasticsearch - 在 spring data elasticsearch 中序列化 AggregatedPage 时出现 jackson 解析错误

    elasticsearch - 如何在 Elasticsearch 中检索版本为n的所有文档

    Elasticsearch 7.4 错误地提示快照已经在运行

    elasticsearch - elasticsearch 中低基数字段的缓慢聚合

    solr - 使用 Lucene/Solr/ElasticSearch 的开箱即用联合搜索

    spring-boot - 需要为ElasticsearchRepository findBy查询编写自定义分析器

    spring-data-elasticsearch - 使用 spring 数据 Elasticsearch 禁用 _source 字段