c++ - 令人惊讶的是， vector <char> 的自定义分配器效率低下

我想使用带有下面自定义分配器的 vector ，其中 construct() 和 destroy() 有一个空的主体:

struct MyAllocator : public std::allocator<char> {
    typedef allocator<char> Alloc;
    //void destroy(Alloc::pointer p) {} // pre-c+11
    //void construct(Alloc::pointer p, Alloc::const_reference val) {} // pre-c++11
    template< class U > void destroy(U* p) {}
    template< class U, class... Args > void construct(U* p, Args&&... args) {}
    template<typename U> struct rebind {typedef MyAllocator other;};
};

现在，由于我在 another question 中指定的原因， vector 必须在循环中多次调整大小。为了简化性能测试，我做了一个非常简单的循环，如下所示:

std::vector<char, MyAllocator> v;
v.reserve(1000000); // or more. Make sure there is always enough allocated memory
while (true) {
   v.resize(1000000);
   // sleep for 10 ms
   v.clear(); // or v.resize(0);
};

我注意到，尽管分配器具有空的 construct() 和 destroy() 成员函数，但更改大小会使 CPU 消耗从 30% 增加到 80% 。因此，我预计对性能的影响非常小或根本没有影响(启用优化)。消费增量如何可能？第二个问题是:为什么在调整大小后读取内存时，我看到调整大小的内存中每个字符的值为 0 (我期望一些非零值，因为 constuct() 确实没有什么) ？

我的环境是 g++4.7.0 ，启用了-O3级别优化。 PC Intel 双核，4GB 可用内存。显然对 construct 的调用根本无法优化？

最佳答案

已更新

^{这是一个完整的重写。原始帖子/我的答案中有一个错误，这使我对同一分配器进行了两次基准测试。哎呀。}

嗯，我可以看到性能上的巨大差异。我制作了以下测试台，它采取了一些预防措施来确保关键的东西不会完全优化。然后，我验证(使用 -O0 -fno-inline)分配器的 construct 和 destruct 调用是否被调用了预期的次数(是):

#include <vector>
#include <cstdlib>

template<typename T>
struct MyAllocator : public std::allocator<T> {
    typedef std::allocator<T> Alloc;
    //void destroy(Alloc::pointer p) {} // pre-c+11
    //void construct(Alloc::pointer p, Alloc::const_reference val) {} // pre-c++11
    template< class U > void destroy(U* p) {}
    template< class U, class... Args > void construct(U* p, Args&&... args) {}
    template<typename U> struct rebind {typedef MyAllocator other;};
};

int main()
{
    typedef char T;
#ifdef OWN_ALLOCATOR
    std::vector<T, MyAllocator<T> > v;
#else
    std::vector<T> v;
#endif
    volatile unsigned long long x = 0;
    v.reserve(1000000); // or more. Make sure there is always enough allocated memory
    for(auto i=0ul; i< 1<<18; i++) {
        v.resize(1000000);
        x += v[rand()%v.size()];//._x;
        v.clear(); // or v.resize(0);
    };
}

时间差异已标记:

g++ -g -O3 -std=c++0x -I ~/custom/boost/ test.cpp -o test 

real    0m9.300s
user    0m9.289s
sys 0m0.000s

g++ -g -O3 -std=c++0x -DOWN_ALLOCATOR -I ~/custom/boost/ test.cpp -o test 

real    0m0.004s
user    0m0.000s
sys 0m0.000s

我只能假设您所看到的内容与标准库优化 char 的分配器操作(它是 POD 类型)有关。

当您使用时，时间间隔会更远

struct NonTrivial
{
    NonTrivial() { _x = 42; }
    virtual ~NonTrivial() {}
    char _x;
};

typedef NonTrivial T;

在这种情况下，默认分配器需要超过 2 分钟(仍在运行)。而“虚拟”MyAllocator 花费约 0.006 秒。 (注意，这会调用未定义的行为，引用尚未正确初始化的元素。)

关于c++ - 令人惊讶的是， vector <char> 的自定义分配器效率低下，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/15237449/

c++ - 令人惊讶的是， vector <char> 的自定义分配器效率低下

已更新

上一篇：c++ - 正确的类型转换以 boost 不同派生类的反序列化

下一篇：带有c++库的c#应用程序System.IO.FileNotFoundException