caching - 在C#内存中实现文本索引

我有一个对性能敏感的任务，我正在考虑在内存中存储大约 100,000 个项目的所有对象。 (在ms sql中持久化，但在内存中复制以提高复杂搜索性能)

按键搜索速度足够快，但按文本搜索，例如。 Contains 相对较慢 - 每个查询大约需要 30 毫秒，如下所示:

IEnumerable<Product> result =
   products.Where(p =>
   p.Title.Contains(itemnames[rnd.Next(itemnames.Length)]));

我已经尝试过使用内存数据库 db4o，但它的性能更差 - 在 100K 项中每次搜索大约需要 1.5 秒。

有哪些选项可以避免检查每个对象标题并更快地执行此操作？

我可以使用什么内存数据库来解决这个任务？

最佳答案

您可以选择更改存储产品的数据结构吗？加快“包含”搜索速度的一种方法是存储所有可能的 Product.Title Dictionary<string, List<Product>> 中的子字符串。这将使您的搜索时间复杂度为 O(1) 而不是 O(n)。

您可以像这样生成每个子字符串:

public static IEnumberable<string> AllSubstrings(this string value)
{
    int index = 0;
    while(++index <= value.Length)
    {
        yield return value.Substring(0, index);
    }

    index = 0;
    while(++index <= value.Length - 1)
    {
        yield return value.Substring(index);
    }
}

然后你可以像这样填充你的字典:

var titleIndex = new Dictionary<string, List<Product>>();

foreach(Product product in products)
{
    foreach(string substring in product.Title.AllSubstrings())
    {
        if(titleIndex.ContainsKey(substring))
        {
            index[substring].Add(product);
        }
        else
        {
            index[substring] = new List<Product> { product };
        }
    }
}

最后，您可以像这样执行搜索:

string searchString = itemnames[rnd.Next(itemnames.Length)];

if(titleIndex.ContainsKey(searchString))
{
    List<Product> searchResults = titleIndex[searchString];
}

注意:正如您可能已经猜到的，这样存储数据需要预先花费更多的 CPU 时间并使用更多的 RAM。

关于caching - 在C#内存中实现文本索引，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/5130636/

caching - 在C#内存中实现文本索引

上一篇：haskell - 如何不确定地将值放入状态中？

下一篇：haskell - 使用 Haskell 排序对列表进行排序