search - Lucene - simpleAnalyzer - 如何获取匹配的单词?

标签 search lucene full-text-search analyzer

我无法使用以下算法获取单词本身的偏移量或直接获取单词本身。任何帮助将不胜感激

   ...
   Analyzer analyzer = new SimpleAnalyzer();
   MemoryIndex index = new MemoryIndex();

   QueryParser parser = new QueryParser(Version.LUCENE_30, "content", analyzer);

   float score = index.search(parser.parse("+content:" + target));

   if(score > 0.0f)
        System.out.println("How to know matched word?");

最佳答案

这里是完整的内存索引和搜索示例。我刚刚为自己写了一篇文章,效果很好。我知道您需要将索引存储在内存中,但问题是为什么您需要 MemoryIndex 呢?您只需使用 RAMDirectory 即可,您的索引将存储在内存中,因此当您执行搜索时,索引将从 RAMDirectory(内存)加载。

    StandardAnalyzer analyzer = new StandardAnalyzer(Version.LUCENE_34);
    IndexWriterConfig config = new IndexWriterConfig(Version.LUCENE_34, analyzer);
    RAMDirectory directory = new RAMDirectory();
    try {
        IndexWriter indexWriter = new IndexWriter(directory, config);
        Document doc = new Document();
        doc.add(new Field("content", text, Field.Store.YES, Field.Index.ANALYZED, Field.TermVector.WITH_OFFSETS));
        indexWriter.addDocument(doc);
        indexWriter.optimize();
        indexWriter.close();

        QueryParser parser = new QueryParser(Version.LUCENE_34, "content", analyzer);
        IndexSearcher searcher = new IndexSearcher(directory, true);
        IndexReader reader = IndexReader.open(directory, true);

        Query query = parser.parse(word);
        TopScoreDocCollector collector = TopScoreDocCollector.create(10000, true);
        searcher.search(query, collector);
        ScoreDoc[] hits = collector.topDocs().scoreDocs;
        if (hits != null && hits.length > 0) {
            for (ScoreDoc hit : hits) {
                int docId = hit.doc;
                Document hitDoc = searcher.doc(docId);

                TermFreqVector termFreqVector = reader.getTermFreqVector(docId, "content");
                TermPositionVector termPositionVector = (TermPositionVector) termFreqVector;
                int termIndex = termFreqVector.indexOf(word);
                TermVectorOffsetInfo[] termVectorOffsetInfos = termPositionVector.getOffsets(termIndex);

                for (TermVectorOffsetInfo termVectorOffsetInfo : termVectorOffsetInfos) {
                    concordances.add(processor.processConcordance(hitDoc.get("content"), word, termVectorOffsetInfo.getStartOffset(), size));
                }
            }
        }

        analyzer.close();
        searcher.close();
        directory.close();

关于search - Lucene - simpleAnalyzer - 如何获取匹配的单词?,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/9715464/

相关文章:

mysql - 为什么在MySQL中添加新索引时,索引的基数没有变化?

r - 在 R 数据框中查找整行?

c# - 用于索引文档和文本的良好搜索技术是什么

php - sqlite/php-按“输入顺序”排序-内含查询

java - Hibernate 搜索在索引中存储 byte[]

sql-server - 如果包含停用词,即使停用词列表为空,全文搜索也不起作用

algorithm - 如何快速搜索基于字符串的键/值集合

search - 在 Elasticsearch 中区分同义词匹配与常规匹配

search - 在Elasticsearch中查找包含术语的文档数量

solr - 导入丰富的文档时,SOLR 是否有最佳实践 schema.xml?