c++ - parallel_for (Inter TBB) 是否存在类似于我们在 std::function 上看到的开销？

在此链接中std::function vs template关于 std::function 的开销有一个很好的讨论。基本上，要避免传递给 std::function 构造函数的仿函数的堆分配造成 10 倍的开销，您必须使用 std::ref 或 std::cref。

取自@CassioNeri 答案的示例显示了如何通过引用将 lambda 传递给 std::function。

float foo(std::function<float(float)> f) { return -1.0f * f(3.3f) + 666.0f; }
foo(std::cref([a,b,c](float arg){ return arg * 0.5f; }));

现在，Intel Thread Building Block 库使您能够并行评估循环使用 lambda/仿函数，如下例所示。

示例代码:

#include "tbb/task_scheduler_init.h"
#include "tbb/blocked_range.h"
#include "tbb/parallel_for.h"
#include "tbb/tbb_thread.h"
#include <vector>

int main() {
 tbb::task_scheduler_init init(tbb::tbb_thread::hardware_concurrency());
 std::vector<double> a(1000);
 std::vector<double> c(1000);
 std::vector<double> b(1000);

 std::fill(b.begin(), b.end(), 1);
 std::fill(c.begin(), c.end(), 1);

 auto f = [&](const tbb::blocked_range<size_t>& r) {
  for(size_t j=r.begin(); j!=r.end(); ++j) a[j] = b[j] + c[j];    
 };
 tbb::parallel_for(tbb::blocked_range<size_t>(0, 1000), f);
 return 0;
}

所以我的问题是:英特尔 TBB parallel_for 是否具有我们在 std::function 上看到的相同类型的开销(仿函数的堆分配)？我是否应该使用 std::cref 通过引用 parallel_for 来传递我的仿函数/lambda 来加速代码？

最佳答案

Should I pass my functors/lambdas by reference to parallel_for using std::cref to speed up the code?

我不知道您的主要问题的答案。但这并不重要，因为您永远不要这样做 tbb::parallel_for。

作为Cassio Neri他在回答中指出:

Finally, notice that the lifetime of the lambda encloses that of the std::function.

对于他所问问题的情况，情况确实如此。但对于 tbb::parallel_for 来说，这不正确。 parallel_for 的全部是它将在未来的任意时间从其他线程调用给定的函数。

如果您通过引用给它一些仿函数，那么您必须确保此仿函数的生命周期持续到 parallel_for 完成。否则，parallel_for 可能会尝试调用对已销毁对象的引用。

这很糟糕。

因此，无论发生什么开销，您都无法通过引用来解决它。

关于c++ - parallel_for (Inter TBB) 是否存在类似于我们在 std::function 上看到的开销？，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/18554256/

c++ - parallel_for (Inter TBB) 是否存在类似于我们在 std::function 上看到的开销？

上一篇：c++ - 将骨骼数据发送到 glsl 着色器

下一篇：c++ - 嵌套命名空间 : where should default template arguments go? 中模板类的前向声明