C++ 多线程性能比单线程代码慢

我正在学习在 C++ 中使用线程
我用整数创建了一个很长的 vector 并设置了另一个整数 x。我想计算该整数与 vector 中的整数之间的差值。

但是，在我的实现中，使用两个线程的函数比单线程函数慢。我想知道这是为什么，我怎样才能正确地实现线程以使其运行得更快。

这是代码:

#include <iostream>
#include <vector>
#include <thread>
#include <future>
#include <math.h>

using namespace std;


vector<int> vector_generator(int size) {
    vector<int> temp;
    for (int i = 0; i < size; i++) {
        temp.push_back(i);
    }
    return temp;
}

vector<int> dist_calculation(int center, vector<int> &input, int start, int end) {
    vector<int> temp;
    for (int i = start; i < end; i++) {
        temp.push_back(abs(center - input[i]));
    }
    return temp;
}


void multi_dist_calculation(int center, vector<int> &input) {
    int mid = input.size() / 2;

    vector<int> temp1(input.begin(), input.begin() + mid);
    vector<int> temp2(input.begin()+mid, input.end());

    auto future1 = async(dist_calculation, center, temp1, 0, mid);
    auto future2 = async(dist_calculation, center, temp2, 0, mid);

    vector<int> result1 = future1.get();
    vector<int> result2 = future2.get();

    return;
}


int main() {

    vector<int> v1 = vector_generator(1000000000);
    vector<int> result;
    multi_dist_calculation(0, v1);
    //dist_calculation(50, v1, 0, v1.size());

    return 0;
}

更新#1

添加了 std::launch::async 和 reserve() 的建议，它确实使代码更快。但是 2 线程函数仍然比单线程函数慢。我能说在这种计算中，单线程更快吗？

#include <iostream>
#include <vector>
#include <thread>
#include <future>
#include <math.h>

using namespace std;


vector<int> vector_generator(int size) {
    vector<int> temp;
    temp.reserve(size);
    for (int i = 0; i < size; i++) {
        temp.push_back(i);
    }
    return temp;
}

vector<int> dist_calculation(int center, vector<int> &input, int start, int end) {
    vector<int> temp;
    temp.reserve(end - start);
    for (int i = start; i < end; i++) {
        temp.push_back(abs(center - input[i]));
    }
    return temp;
}


void multi_dist_calculation(int center, vector<int> &input) {
    int mid = input.size() / 2;

    auto future1 = async(std::launch::async, dist_calculation, center, input,   0, mid);
    auto future2 = async(std::launch::async, dist_calculation, center, input, mid, input.size());

    vector<int> result1 = future1.get();
    vector<int> result2 = future2.get();

    return;
}


int main() {

    vector<int> v1 = vector_generator(1000000000);
    vector<int> result;
    int center = 0;
    multi_dist_calculation(center, v1);
    //dist_calculation(center, v1, 0, v1.size());

    return 0;
}

最佳答案

您没有将任何std::launch policy传递给std::async , 所以它给实现留下了很大的自由度。

Behaves as if (2) is called with policy being std::launch::async | std::launch::deferred. In other words, f may be executed in another thread or it may be run synchronously when the resulting std::future is queried for a value.

但也请注意，更一般地说，使用更多线程，尤其是在小任务上可能不会更快。

如果 dist_calculation 或您想要线程化的任何任务都是少量工作，请注意开销。创建新线程的成本相对较高，而且 std::async 使用的任何内部池、promises 和 futures 也会产生开销。
此外，按照这种编写方式，您可能会创建更多 vector ，具有更多动态内存，并且您需要合并结果，这也会产生一些成本。
在更复杂的情况下，如果同步，例如与 std::mutex 相关，这可能会比额外的线程获得更多的性能成本。
在某些情况下，瓶颈不是 CPU。例如，它可能是磁盘/存储速度(包括页面/交换文件)、网络速度(包括远程服务器)，甚至是内存带宽(除了 NUMA 感知优化，它们比仅使用 std 复杂得多: :异步)。这些中的多线程只会增加开销，但没有任何好处。

您应该尽可能首先使用其他基本优化，例如 reserve vector 的大小以避免不必要的分配和复制，也许 resize 并使用 vector[index] = a 而不是 push_back 等

对于像 abs(centre - input[i]) 这样简单的事情，您可能会从 SIMD(单指令多数据)优化中获得更多改进。例如确保您正在编译任何优化，例如启用 SSE2，如果编译器没有适当优化循环(我认为 push_back 可能会干扰，测试!)，稍微改变一下，或者甚至可以显式使用 vector 指令(对于 x86，请查看 _mm_add_epi32 等)。

关于C++ 多线程性能比单线程代码慢，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/55743459/

C++ 多线程性能比单线程代码慢

上一篇：c++ - 从将它作为参数的函数返回一个引用

下一篇：c++ - 连接按位运算的正确方法？