c# - Lucene.Net (4.8) 自动完成/自动建议

标签 c# lucene.net

我想使用 Lucene.Net 4.8 实现一个可搜索索引,为用户提供单个单词和短语的建议/自动完成。

索引创建成功;这些建议是我停滞不前的地方。

4.8 版本似乎引入了大量重大更改,而且我发现所有可用示例都不起作用。

我的立场

供引用,LuceneVersion 是这样的:

私有(private)只读 LuceneVersion LuceneVersion = LuceneVersion.LUCENE_48;

解决方案1

I've tried this ,但无法通过 reader.Terms:

    public void TryAutoComplete()
    {
        var analyzer = new EnglishAnalyzer(LuceneVersion);
        var config = new IndexWriterConfig(LuceneVersion, analyzer);
        RAMDirectory dir = new RAMDirectory();
        using (IndexWriter iw = new IndexWriter(dir, config))
        {
            Document d = new Document();
            TextField f = new TextField("text","",Field.Store.YES);
            d.Add(f);
            f.SetStringValue("abc");
            iw.AddDocument(d);
            f.SetStringValue("colorado");
            iw.AddDocument(d);
            f.SetStringValue("coloring book");
            iw.AddDocument(d);
            iw.Commit();
            using (IndexReader reader = iw.GetReader(false))
            {
                TermEnum terms = reader.Terms(new Term("text", "co"));
                int maxSuggestsCpt = 0;
                // will print:
                // colorado
                // coloring book
                do
                {
                    Console.WriteLine(terms.Term.Text);
                    maxSuggestsCpt++;
                    if (maxSuggestsCpt >= 5)
                        break;
                }
                while (terms.Next() && terms.Term.Text.StartsWith("co"));
            }
        }
    }

reader.Terms no longer exists 。作为 Lucene 的新手,尚不清楚如何重构它。

解决方案2

尝试this ,我抛出了一个错误:

    public void TryAutoComplete2()
    {
        using(var analyzer = new EnglishAnalyzer(LuceneVersion))
        {
            IndexWriterConfig config = new IndexWriterConfig(LuceneVersion, analyzer);
            RAMDirectory dir = new RAMDirectory();
            using(var iw = new IndexWriter(dir,config))
            {
                Document d = new Document()
                {
                    new TextField("text", "this is a document with a some words",Field.Store.YES),
                    new Int32Field("id", 42, Field.Store.YES)
                };

                iw.AddDocument(d);
                iw.Commit();

                using (IndexReader reader = iw.GetReader(false))
                using (SpellChecker speller = new SpellChecker(new RAMDirectory()))
                {
                    //ERROR HERE!!!
                    speller.IndexDictionary(new LuceneDictionary(reader, "text"), config, false);
                    string[] suggestions = speller.SuggestSimilar("dcument", 5);
                    IndexSearcher searcher = new IndexSearcher(reader);
                    foreach (string suggestion in suggestions)
                    {
                        TopDocs docs = searcher.Search(new TermQuery(new Term("text", suggestion)), null, Int32.MaxValue);
                        foreach (var doc in docs.ScoreDocs)
                        {
                            System.Diagnostics.Debug.WriteLine(searcher.Doc(doc.Doc).Get("id"));
                        }
                    }
                }
            }
        }
    }

调试时,speller.IndexDictionary(new LuceneDictionary(reader, "text"), config, false); 抛出The object can be set times! 错误,我无法解释。

欢迎任何想法。

澄清

我想返回给定输入的建议术语列表,而不是文档或其完整内容。

例如,如果文档包含“你好,我的名字是 Clark。我来自亚特兰大”,并且我提交“Atl”,那么“Atlanta”应该作为建议返回。

最佳答案

如果我理解正确的话,你的索引设计可能有点过于复杂了。如果您的目标是使用 Lucene 进行自动完成,您需要创建您认为完成的术语的索引。然后只需使用 PrefixQuery 使用部分单词或短语查询索引即可。

using Lucene.Net.Analysis;
using Lucene.Net.Analysis.En;
using Lucene.Net.Documents;
using Lucene.Net.Index;
using Lucene.Net.Search;
using Lucene.Net.Store;
using Lucene.Net.Util;
using System;
using System.Linq;

namespace LuceneDemoApp
{
    class LuceneAutoCompleteIndex : IDisposable
    {
        const LuceneVersion Version = LuceneVersion.LUCENE_48;
        RAMDirectory Directory;
        Analyzer Analyzer;
        IndexWriterConfig WriterConfig;

        private void IndexDoc(IndexWriter writer, string term)
        {
            Document doc = new Document();
            doc.Add(new StringField(FieldName, term, Field.Store.YES));
            writer.AddDocument(doc);
        }

        public LuceneAutoCompleteIndex(string fieldName, int maxResults)
        {
            FieldName = fieldName;
            MaxResults = maxResults;
            Directory = new RAMDirectory();
            Analyzer = new EnglishAnalyzer(Version);
            WriterConfig = new IndexWriterConfig(Version, Analyzer);
            WriterConfig.OpenMode = OpenMode.CREATE_OR_APPEND;
        }

        public string FieldName { get; }
        public int MaxResults { get; set; }

        public void Add(string term)
        {
            using (var writer = new IndexWriter(Directory, WriterConfig))
            {
                IndexDoc(writer, term);
            }
        }

        public void AddRange(string[] terms)
        {
            using (var writer = new IndexWriter(Directory, WriterConfig))
            {
                foreach (string term in terms)
                {
                    IndexDoc(writer, term);
                }
            }
        }

        public string[] WhereStartsWith(string term)
        {
            using (var reader = DirectoryReader.Open(Directory))
            {
                IndexSearcher searcher = new IndexSearcher(reader);
                var query = new PrefixQuery(new Term(FieldName, term));
                TopDocs foundDocs = searcher.Search(query, MaxResults);
                var matches = foundDocs.ScoreDocs
                    .Select(scoreDoc => searcher.Doc(scoreDoc.Doc).Get(FieldName))
                    .ToArray();

                return matches;
            }
        }

        public void Dispose()
        {
            Directory.Dispose();
            Analyzer.Dispose();
        }
    }
}

运行这个:

var indexValues = new string[] { "apple fruit", "appricot", "ape", "avacado", "banana", "pear" };
var index = new LuceneAutoCompleteIndex("fn", 10);
index.AddRange(indexValues);

var matches = index.WhereStartsWith("app");
foreach (var match in matches)
{
    Console.WriteLine(match);
}

你得到这个:

apple fruit
appricot

关于c# - Lucene.Net (4.8) 自动完成/自动建议,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/60417740/

相关文章:

c# - 支持多个用户输入属性的应用程序架构

c# - 尝试将粗体应用于整行但不断获得空引用 - NPOI

c# - 存储多个尺寸的图像还是只存储主要图像并调整大小?

c# - 在 Mono 2.10 中使用表值参数 (SqlDbType.Structured)

c# - NHibernate + SqlServer 全文搜索

c# - 如何为 lucene 添加多个 AND bool 查询

asp.net - Lucene.NET --> 拒绝访问段

C# Lucene.Net IndexWriter.DeleteDocuments 不起作用

lucene - 如何知道 Lucene 索引生成过程何时完成

c# - WPF 绑定(bind)到不在 DataGrid ItemSsource 中的属性