java - 第一次搜索迭代后，自动建议在 Lucene 中不起作用

目前我正在我的应用程序中使用 lucene 处理自动建议部分。单词的自动建议在控制台应用程序中工作正常，但现在我已经集成到 Web 应用程序，但它没有按预期方式工作。

当第一次使用某些关键字搜索和自动建议搜索文档时，两者都可以正常工作并显示结果。但是当我再次搜索其他关键字或相同关键字时，自动建议和搜索结果都没有显示。我无法弄清楚为什么会出现这种奇怪的结果。

自动建议和搜索的片段如下:

final int HITS_PER_PAGE = 20;

final String RICH_DOCUMENT_PATH = "F:\\Sample\\SampleRichDocuments";
final String INDEX_DIRECTORY = "F:\\Sample\\LuceneIndexer";

String searchText = request.getParameter("search_text");

BooleanQuery.Builder booleanQuery = null;
Query textQuery = null;
Query fileNameQuery = null;

try {
    textQuery = new QueryParser("content", new StandardAnalyzer()).parse(searchText);
    fileNameQuery = new QueryParser("title", new StandardAnalyzer()).parse(searchText);
    booleanQuery = new BooleanQuery.Builder();
    booleanQuery.add(textQuery, BooleanClause.Occur.SHOULD);
    booleanQuery.add(fileNameQuery, BooleanClause.Occur.SHOULD);
} catch (ParseException e) {
    e.printStackTrace();
}


Directory index = FSDirectory.open(new File(INDEX_DIRECTORY).toPath());
IndexReader reader = DirectoryReader.open(index);

IndexSearcher searcher = new IndexSearcher(reader);
TopScoreDocCollector collector = TopScoreDocCollector.create(HITS_PER_PAGE);

try{
    searcher.search(booleanQuery.build(), collector);
        ScoreDoc[] hits = collector.topDocs().scoreDocs;

    for (ScoreDoc hit : hits) {
        Document doc = reader.document(hit.doc);
    }

    // Auto Suggestion of the data

    Dictionary dictionary = new LuceneDictionary(reader, "content");
    AnalyzingInfixSuggester analyzingSuggester = new AnalyzingInfixSuggester(index, new StandardAnalyzer());
    analyzingSuggester.build(dictionary);

    List<LookupResult> lookupResultList = analyzingSuggester.lookup(searchText, false, 10);
    System.out.println("Look up result size :: "+lookupResultList.size());
    for (LookupResult lookupResult : lookupResultList) {
         System.out.println(lookupResult.key+" --- "+lookupResult.value);
    }

    analyzingSuggester.close();
    reader.close();

}catch(IOException e){
    e.printStackTrace();
}

例如: 在第一次迭代中，如果我搜索单词“sample”

自动建议给我结果:sample, samples, sampler etc.(这些是文档中的词)
搜索结果为:样本

但是，如果我再次使用相同或不同的文本进行搜索，则不会显示任何结果，而且 LookUpResult 列表大小也将变为零。

我不明白为什么会这样。请帮忙

下面是从文档集创建索引的更新代码。

final String INDEX_DIRECTORY = "F:\\Sample\\LuceneIndexer";
long startTime = System.currentTimeMillis();
List<ContentHandler> contentHandlerList = new ArrayList<ContentHandler>    ();

String fileNames = (String)request.getAttribute("message");

File file = new File("F:\\Sample\\SampleRichDocuments"+fileNames);

ArrayList<File> fileList = new ArrayList<File>();
fileList.add(file);

Metadata metadata = new Metadata();

// Parsing the Rich document set with Apache Tikka
ContentHandler handler = new BodyContentHandler(-1);
ParseContext context = new ParseContext();
Parser parser = new AutoDetectParser();
InputStream stream = new FileInputStream(file);

try {
    parser.parse(stream, handler, metadata, context);
    contentHandlerList.add(handler);
}catch (TikaException e) {
    e.printStackTrace();
}catch (SAXException e) {
    e.printStackTrace();
} catch (IOException e) {
    e.printStackTrace();
}
finally {
    try {
        stream.close();
    } catch (IOException e) {
        e.printStackTrace();
    }
}

FieldType fieldType = new FieldType();
fieldType.setIndexOptions(IndexOptions.DOCS_AND_FREQS_AND_POSITIONS_AND_OFFSETS);
fieldType.setStoreTermVectors(true);
fieldType.setStoreTermVectorPositions(true);
fieldType.setStoreTermVectorPayloads(true);
fieldType.setStoreTermVectorOffsets(true);
fieldType.setStored(true);

Analyzer analyzer = new StandardAnalyzer();
Directory directory = FSDirectory.open(new      File(INDEX_DIRECTORY).toPath());
IndexWriterConfig conf = new IndexWriterConfig(analyzer);
IndexWriter writer = new IndexWriter(directory, conf);

Iterator<ContentHandler> handlerIterator = contentHandlerList.iterator();
Iterator<File> fileIterator = fileList.iterator();

Date date = new Date();

while (handlerIterator.hasNext() && fileIterator.hasNext()) {
Document doc = new Document();

String text = handlerIterator.next().toString();
String textFileName = fileIterator.next().getName();

String fileName = textFileName.replaceAll("_", " ");
fileName = fileName.replaceAll("-", " ");
fileName = fileName.replaceAll("\\.", " ");

String fileNameArr[] = fileName.split("\\s+");
for(String contentTitle : fileNameArr){
    Field titleField = new Field("title",contentTitle,fieldType);
    titleField.setBoost(2.0f);
    doc.add(titleField);
}

if(fileNameArr.length > 0){
    fileName = fileNameArr[0];
}

String document_id= UUID.randomUUID().toString();

FieldType documentFieldType = new FieldType();
documentFieldType.setStored(false);

Field idField = new Field("document_id",document_id, documentFieldType);
Field fileNameField = new Field("file_name", textFileName, fieldType);
Field contentField = new Field("content",text,fieldType);

doc.add(idField);
doc.add(contentField);
doc.add(fileNameField);

writer.addDocument(doc);

analyzer.close();
}

writer.commit();
writer.deleteUnusedFiles();
long endTime = System.currentTimeMillis();

writer.close();

我还观察到，从第二次搜索迭代开始，索引目录中的文件被删除，只有带有 .segment 后缀的文件发生变化，如 .segmenta、.segmentb、.segmentc 等。

我不知道为什么会出现这种奇怪的情况。

最佳答案

您的代码看起来非常简单。所以，我感觉到您可能会遇到这个问题，因为您的索引出了问题，提供有关您如何构建索引的信息可能有助于诊断。但是这次是确切的代码:)

关于java - 第一次搜索迭代后，自动建议在 Lucene 中不起作用，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/39320279/

java - 第一次搜索迭代后，自动建议在 Lucene 中不起作用

上一篇：java - CQEngine 优化小型数据集

下一篇：java - 如何处理 java.time 中的完整周期？