c# - 使Elasticsearch变音符号不敏感

标签 c# asp.net-mvc elasticsearch nest

我在.NET MVC项目中使用Elasticsearch 6.6.0和NEST。

我正在使用以下代码为某些产品编制索引:

var esSettings = new ConnectionSettings(node);
esSettings = esSettings.DefaultIndex(IndexInstanceName);
esSettings = esSettings
    .DefaultMappingFor<SearchableProduct>(s => s.IdProperty("Id").IndexName(IndexInstanceName + "-products-" + ConfigurationManager.AppSettings["DefaultCulture"]));

var elastic = new ElasticClient(esSettings);
var mapResponse = elastic.Map<SearchableProduct>(x => x.AutoMap().Index(IndexInstanceName + "-products-" + culture));

var indexState = new IndexState
{
    Settings = new IndexSettings()
};

indexState.Settings.Analysis = new Analysis
{
    Analyzers = new Analyzers()
};

indexState.Settings.Analysis.Analyzers.Add("nospecialchars", new CustomAnalyzer
{
    Tokenizer = "standard",
    Filter = new List<string> { "standard", "lowercase", "stop", "asciifolding" }
});

//products
if (!elastic.IndexExists(IndexInstanceName + "-products-" + culture).Exists)
{
    var response = elastic.CreateIndex(
        IndexInstanceName + "-products-" + culture,
        s => s.InitializeUsing(indexState)
               .Mappings(m => m.Map<SearchableProduct>(sc => sc.AutoMap())));
}

await this.IndexProductsAsync(context, products, elastic, culture);
await elastic.RefreshAsync(new RefreshRequest(IndexInstanceName + "-products-" + culture));


对于搜索,我使用以下代码:
ISearchResponse<SearchableProduct> result = await elastic.SearchAsync<SearchableProduct>(s => s
                           .Index(elasticIndexName + "-products-" + culture)
                           .Take(DefaultPageSize)
                           .Source(src => src.IncludeAll())
                            .Query(query =>
                               query.QueryString(qs =>
                                qs.Query(q).DefaultOperator(Operator.And).Fuzziness(Fuzziness.EditDistance(0)).Fields(x => x.Field(d => d.Name, 2)
                                                    .Field(d => d.MetaTitle, 1)
                                                    .Field(d => d.Image, 1)
                                                    .Field(d => d.SystemId, 2)
                                                    .Field(d => d.Manufacturer, 1)
                                        )
                            ))
                           .Sort(d => d.Ascending(SortSpecialField.Score))
                        );

当我用希腊语搜索带有重音的单词(例如παγωτό)时,会得到结果(因为在我的索引中该产品被带有重音的索引),但是当我使用不带重音的相同单词(例如παγωτο)时,我没有结果。

索引设置或搜索代码有什么问题吗?

我可以在没有重音符号的情况下为我的数据编制索引,还是可以按原样对它们编制索引,但使搜索或索引重音符号不敏感?

最佳答案

使用greek分析器创建字段将确保索引文本和查询字符串通过相同的分析路径。对于παγωτό,这意味着在建立索引期间,以及在发出查询请求期间,文本将被标记为παγωτ

请检查我的示例,该示例使用greek分析器创建一个字段,并在查找παγωτόπαγωτο时,该示例输出带有παγωτόπαγωτο的两个文档。

class Program
{
    static async Task Main(string[] args)
    {
        var connectionPool = new SingleNodeConnectionPool(new Uri("http://localhost:9200"));
        var settings = new ConnectionSettings(connectionPool)
            .DefaultIndex("index_name")
            .DisableDirectStreaming()
            .PrettyJson();
        var client = new ElasticClient(settings);

        await client.Indices.DeleteAsync("index_name");

        var createIndexResponse = await client.Indices.CreateAsync("index_name",
            c => c
                .Map(map => map.AutoMap<Document>()));

        await client.IndexManyAsync(new []
            {new Document {Id = 1, Text = "παγωτό"}, new Document {Id = 2, Text = "παγωτο"},});

        await client.Indices.RefreshAsync();

        var query = "παγωτό";
        var searchResponse = await client.SearchAsync<Document>(s => s
            .Query(q => q.Match(m => m.Field(f => f.Text).Query(query))));

        Console.OutputEncoding = Encoding.UTF8;

        Print(query, searchResponse);

        query = "παγωτο";
        var searchResponse2 = await client.SearchAsync<Document>(s => s
            .Query(q => q.Match(m => m.Field(f => f.Text).Query(query))));

        Print(query, searchResponse2);
    }

    private static void Print(string query, ISearchResponse<Document> searchResponse)
    {
        Console.WriteLine($"For {query} found:");
        foreach (var document in searchResponse.Documents)
        {
            Console.WriteLine($"Document {document.Id} {document.Text}");
        }
    }
}

public class Document
{
    public int Id { get; set; }
    [Text(Analyzer = "greek")]
    public string Text { get; set; }
}

打印品:
For παγωτό found:
Document 1 παγωτό
Document 2 παγωτο
For παγωτο found:
Document 1 παγωτό
Document 2 παγωτο

希望能有所帮助。

关于c# - 使Elasticsearch变音符号不敏感,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/58096563/

相关文章:

elasticsearch - 如何使用Elasticsearch过渡API(别名已更新以指向最新创建的索引)?

Elasticsearch 按嵌套对象的数量排序

c# - ValueTask 有 ContinueWith 吗?

c# - ASP.NET Core Web API Facebook JWT 身份验证

c# - IP 地址字符串正在返回::1。如何从本地主机获取用户测试的完整 IP 地址

elasticsearch - Elasticsearch Nest:是否可以在自动映射的属性属性上定义标记生成器?

c# - 属性中的单词 "property:"是什么

c# - 包括属性(property)但排除该属性(property)的属性(property)之一

javascript - 谷歌地图无法读取未定义的属性 'maps'

asp.net-mvc - 如何在ASP.NET MVC中使用[HandleError]属性?