c++ - 为什么用execute::par对 vector 进行排序要比正常排序(gcc 10.1.0)花费更长的时间？

考虑以下代码:

#include <algorithm>
#include <chrono>
#include <cstdio>
#include <execution>
#include <functional>
#include <random>
#include <vector>
using namespace std;
using namespace std::chrono;

constexpr size_t NUM_OF_ELEMENTS = 30000000;

// execute lambda and print the execution time
void measure(function<void()> lambda)
{
    auto start = high_resolution_clock::now();
    lambda();
    auto end = high_resolution_clock::now();
    
    printf("%ld\n", duration_cast<microseconds>(end - start).count());
}

int main()
{
    random_device rd;
    mt19937_64 gen(rd());
    // range from INT_MIN to INT_MAX
    uniform_int_distribution<> distr(-2147483648, 2147483647);
    
    vector<int> original;
    original.reserve(NUM_OF_ELEMENTS);
    
    for(size_t i = 0; i < NUM_OF_ELEMENTS; i++)
        original.push_back(distr(gen));
    
    vector<int> the_copy(original.begin(), original.end());
    
    // sort with single thread
    measure([&]{ sort(original.begin(), original.end()); });
    
    // sort with execution::par
    measure([&]{ sort(execution::par, the_copy.begin(), the_copy.end()); });
    return 0;
}

该代码可以归纳为以下几点:

创建随机数生成器

创建随机整数的 vector

创建该 vector

的拷贝

用一个线程对原始 vector 进行排序，并测量执行时间

使用std::execution::par对拷贝进行排序，并测量执行时间

打印执行时间

execution::par版本始终需要更长的时间。 NUM_OF_ELEMENTS具有什么值都没有关系。我尝试将值从100000增加到30000以100000递增。上面的代码产生类似的结果(值以微秒为单位):

9729406    // single thread
10834613   // execution::par

我使用VS Code在带有gcc的Windows 10上编译了代码:g++ -std=c++17 -g ${workspaceFolder}/main.cpp -o ${workspaceFolder}/main.exe对于C++标准库，我使用mingw发行版，可以在here中找到它。
程序版本:GCC 10.1.0 + LLVM/Clang/LLD/LLDB 10.0.0 + MinGW-w64 7.0.0我的处理器有6个核心，执行时我没有运行任何主要程序或后台进程。
首先，我认为这与 vector 的大小有关，但是30 000个元素肯定足够。在完成单个测试之前，它已经运行了10秒钟。

这是怎么回事？

execution::par是否打算像这样使用？

是否必须启用一些编译标志或其他技巧才能使其按预期工作？

最佳答案

在您的代码上运行性能，似乎花费了一点点时间来尝试对数据进行分区。
这只是一个示例，但是我运行了几次，并且一致地认为并行版本需要更长的时间才能将数据划分为多个级别。由于它是递归的，因此很难准确了解最终会增加多少额外的开销。
sort1是非并行排序。
sort2是并行排序。

除此之外，对基本问题的答案是，您需要安装intel线程构建块，以便gcc使用串行算法以外的任何东西。
可以使用sudo apt install libtbb-dev在Linux上快速简单地安装它，然后使用-ltbb对其进行链接

关于c++ - 为什么用execute::par对 vector 进行排序要比正常排序(gcc 10.1.0)花费更长的时间？，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/62900329/

c++ - 为什么用execute::par对 vector 进行排序要比正常排序(gcc 10.1.0)花费更长的时间？

上一篇：c++ - 如何在 C++ 中为模板函数实例创建快捷方式？

下一篇：c++ - 此素数测试函数的时间复杂度