java - 如何在 Lucene 7.x 中使用 CustomScoreQuery

我对 Lucene 比较陌生，并且想要实现我自己的 CustomScoreQuery，因为我的大学需要它。

我使用 Lucene 演示作为起点来索引文件夹中的所有文档，并希望使用我自己的算法对它们进行评分。

这里是演示源代码的链接。

https://lucene.apache.org/core/7_1_0/demo/src-html/org/apache/lucene/demo/IndexFiles.html https://lucene.apache.org/core/7_1_0/demo/src-html/org/apache/lucene/demo/SearchFiles.html

我正在与 Luke: Lucene Toolbox Project 检查以查看我的索引是否符合预期。我在访问它时出现问题。

package CustomModul;
import java.io.IOException;
import org.apache.lucene.index.LeafReaderContext;
import org.apache.lucene.index.IndexReader;
import org.apache.lucene.index.Terms;
import org.apache.lucene.queries.CustomScoreProvider;
import org.apache.lucene.queries.CustomScoreQuery;
import org.apache.lucene.search.Query;

public class CountingQuery extends CustomScoreQuery {

public CountingQuery(Query subQuery) {
    super(subQuery);
}


public class CountingQueryScoreProvider extends CustomScoreProvider {

    String _field;

    public CountingQueryScoreProvider(String field, LeafReaderContext context) {
        super(context);
        _field = field;
    }

    public float customScore(int doc, float subQueryScore, float valSrcScores[]) throws IOException {           
        IndexReader r = context.reader();

        //getTermVector returns Null
        Terms vec = r.getTermVector(doc, _field);

        //*TO-DO* Algorithm

        return (float)(1.0f);       
    }   
}

protected CustomScoreProvider getCustomScoreProvider(
        LeafReaderContext context) throws IOException {
    return new CountingQueryScoreProvider("contents", context);
}

}

在我的 customScore 函数中，我像大多数教程中描述的那样访问索引。我应该使用 getTermVector 访问索引，但它返回 NULL。在其他帖子中，我读到这可能是由于 Lucene 演示索引文件中声明的 TextField 内容引起的。

在尝试了很多不同的方法之后，我得出的结论是我需要帮助，而我就在这里。

我现在的问题是我是否需要调整索引过程(如何调整？)或者除了 getTermVector 之外还有其他方法可以访问 ScoreProvider 中的索引？

最佳答案

我能够自己解决这个问题，并且如果有人发现这个问题正在寻找答案，我想分享我的解决方案。

问题确实是由于内容是 TextField 引起的 https://lucene.apache.org/core/7_1_0/demo/src-html/org/apache/lucene/demo/IndexFiles.html

为了解决这个问题，我们必须构建自己的字段，我用它替换了 IndexFile 中的第 193 行

FieldType myFieldType = new FieldType(TextField.TYPE_STORED);
myFieldType.setOmitNorms(true);
myFieldType.setIndexOptions(IndexOptions.DOCS_AND_FREQS);
myFieldType.setStored(false);
myFieldType.setStoreTermVectors(true);  
myFieldType.setTokenized(true);
myFieldType.freeze();
Field myField = new Field("contents",
                new BufferedReader(new InputStreamReader(stream, 
                StandardCharsets.UTF_8)),
                myFieldType);
doc.add(myField);

这允许在 customScore 函数中使用 getTermVector。希望这对将来的人有所帮助。

关于java - 如何在 Lucene 7.x 中使用 CustomScoreQuery，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/48383780/

java - 如何在 Lucene 7.x 中使用 CustomScoreQuery

上一篇：检查非常非常长的字符串时发生 Java 堆空间错误

下一篇：java - System.getenv() - 行为取决于是否启用 Debug模式