c++ - 如何优化路径列表中的目录列表？

在编写 fuse 文件系统时，我有一个 unordered_map<std::string, struct stat>作为缓存，在启动时填充所有文件和目录，以减少硬盘驱动器上的读取。

满足readdir()回调我编写了以下循环:

const int sp = path == "/" ? 0 : path.size();
for (auto it = stat_cache.cbegin(); it != stat_cache.cend(); it++)
{
    if (it->first.size() > sp)
    {
        int ls = it->first.find_last_of('/');
        if (it->first.find(path, 0) == 0 && ls == sp)
            filler(buf, it->first.substr(ls + 1).c_str(), const_cast<struct stat*>(&it->second), 0, FUSE_FILL_DIR_PLUS);
    }
}

这个想法是，路径以目录路径开头并且最后一个斜杠位于目录路径末尾的对象将是它的成员。我已经对此进行了彻底的测试并且它有效。
插图:

Reading directory: /foo/bar
Candidate file:    /bazboo/oof - not in dir (wrong prefix)
Candidate file:    /foo/bar/baz/boo - not in dir (wrong lastslash location)
Candidate file:    /foo/bar/baz - in dir!

然而现在，速度却出奇地慢(特别是在缓存中拥有超过 50 万个对象的文件系统中)。 Valgrind/Callgrind 尤其指责 std::string:find_last_of()和std::string::find()来电。

我已经添加了if (it->first.size() > sp)试图加速循环，但性能增益最多是最小的。

我还尝试通过将循环并行化为四个 block 来加快此例程的速度，但这在 unordered_map::cbegin() 期间以段错误结束。 .
我不再有实际的代码，但我相信它看起来像这样:

const int sp = path == "/" ? 0 : path.size();
ThreadPool<4> tpool;
ulong cq = stat_cache.size()/4;
for (int i = 0; i < 4; i++)
{
    tpool.addTask([&] () {
        auto it = stat_cache.cbegin();
        std::next(it, i * cq);
        for (int j = 0; j < cq && it != stat_cache.cend(); j++, it++)
        {
            if (it->first.size() > sp)
            {
                int ls = it->first.find_last_of('/');
                if (it->first.find(path, 0) == 0 && ls == sp)
                    filler(buf, it->first.substr(ls + 1).c_str(), const_cast<struct stat*>(&it->second), 0, FUSE_FILL_DIR_PLUS);
            }
        }
    });
}
tpool.joinAll();

我还尝试过按 map 桶拆分它，unordered_map::cbegin(int)提供了方便的重载，但仍然出现段错误。

同样，我目前正在使用第一个(非并行)代码，并且希望获得该代码的帮助，因为并行代码不起作用。我只是想我应该包括我对完整性、新手攻击和努力证明的并行尝试。

还有其他选项可以优化此循环吗？

最佳答案

这里要做的一件小事就是更改 if从此:

if (it->first.find(path, 0) == 0 && ls == sp)

简单地说:

if (ls == sp && it->first.find(path, 0) == 0)

显然，比较两个整数比查找子字符串要快得多。
我不能保证它会改善性能，但这是一件微不足道的事情，可能有助于跳过很多不必要的 std::string::find来电。也许编译器已经这样做了，我会研究反汇编。

此外，由于文件路径无论如何都是唯一的，我会使用 std::vector<std::pair<...>>相反 - 更好的缓存局部性、更少的内存分配等。只需记住首先保留大小即可。

关于c++ - 如何优化路径列表中的目录列表？，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/46569384/

c++ - 如何优化路径列表中的目录列表？

上一篇：C++ - 更新 map 上的值

下一篇：c++ - boost::shared_ptr 中的 ref_count