java - {Filter}ing 是否比 Lucene 中的 {Query}ing 更快?

标签 java lucene

在阅读“Lucene in Action 2nd edition”时,我看到了关于 Filter 的描述。可用于在 Lucene 中进行结果过滤的类。 Lucene 有很多过滤器重复 Query类。例如,NumericRangeQueryNumericRangeFilter .

书上说NRFNRQ 完全相同但没有文件评分。这是否意味着如果我不需要评分或按文档字段值对文档进行排序,我应该更喜欢Filter结束 Query从性能的角度来看?

最佳答案

我从 Uwe Schindler 那里得到了很好的回答,让我在这里重新发布。

If you dont cache filters, queries will be faster, as the ConjunctionScorer in Lucene has optimizations, which are currently not used for Filters. Filters are fine, if you cache them (e.g. if you always have the same access restrictions for a specific user that are applied to all his queries). In that case the Filter is only executed once and cached for all further requests and then intersected with the query result set.

If you only want to e.g. randomly "filter" e.g. by a variable numeric range like a bounding box in a geographic search, use queries, queries are in most cases faster (e.g. Range Queries and similar stuff - called MultiTermQueries - are internally also implemented by the same BitSet algorithm like the Filter - in fact they are only Filters wrapped by a Scorer-impl). But the Scorer that ANDs the query and your "filter" query together (ConjunctionScorer) is generally faster than the code that applies the filter after searching. This may some improvement possible, but in general filters are something in Lucene that is not really needed anymore, so there were already some approaches to make Filters and Queries the same, and instead then be able to also cache non-scoring queries. This would make lots of code easier.

Filters can bring a huge speed improvement with Lucene 4.0, if they are plugged ontop of the IndexReader to filter the documents before scoring, but that's not yet implemented (see https://issues.apache.org/jira/browse/LUCENE-3212) - I am working on it. We may also make Filters random access (it's easy as they are bitsets), which could improve also the after-query filtering. But I would then also make Queries partially random access, if they could support it (like queries that are only based on FieldCache).

Uwe

关于java - {Filter}ing 是否比 Lucene 中的 {Query}ing 更快?,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/6462350/

相关文章:

java - JBoss 中的自定义错误页面

java - API设计建议

java - 画一条线指向鼠标

tomcat - lucene 应该在与 tomcat 不同的进程中运行吗

java - java springboot微服务中如何高效处理碎片请求?

java - 对于不同的 java 版本,写入和读取 png 图像必须相同

lucene - lucene 3.03 的好教程?

java - Lucene 4.1.0 Porter Stemmer 无法正常工作

java - Solr-并发提交时OverlappingFileLockException

mongodb - NoSQL (MongoDB) vs Lucene (或 Solr) 作为你的数据库