c++ - 表达式模板 : improving performance in evaluating expressions?

通过表达式模板技术，矩阵表达式如

D = A*B+sin(C)+3.;

在计算性能方面几乎等同于手写的 for 循环。

现在，假设我有以下两个表达式

D = A*B+sin(C)+3.;
F = D*E;
cout << F << "\n";

在表达式模板的“经典”实现中，计算性能与顺序执行两个 for 循环的计算性能几乎相同。这是因为在遇到 = 运算符后会立即计算表达式。

我的问题是:是否有任何技术(例如，使用占位符？)来识别 D 的值实际上未被使用，并且感兴趣的值是 的唯一元素F，所以只有表达式

F = E*(A*B+sin(C)+3.);

被评估并且整个性能相当于单个 for 循环？

当然，这样的假设技术应该也能返回去评估表达式

D = A*B+sin(C)+3.;

如果稍后在代码中需要 D 的值。

提前感谢您的帮助。

编辑:实验 Evgeny 建议的解决方案的结果

原始说明:

Result D=A*B-sin(C)+3.;

计算时间:32ms

两步说明:

Result Intermediate=A*B;
Result D=Intermediate-sin(C)+3.;

计算时间:43ms

auto 的解决方案:

auto&& Intermediate=A*B;
Result D=Intermediate-sin(C)+3.;

计算时间:32ms。

综上所述，auto&&能够恢复单指令情况的原始计算时间。

编辑:根据 Evgeny 的建议总结相关链接

Copy Elision

What does auto tell us

Universal References in C++11

C++ Rvalue References Explained

C++ and Beyond 2012: Scott Meyers - Universal References in C++11

最佳答案

评价expression template当您将结果保存为某些特殊类型时通常会发生这种情况，例如:

Result D = A*B+sin(C)+3.;

结果表达式类型:

A*B+sin(C)+3.

不是Result，但它是可以转换为Result的东西。评估发生在这种转换过程中。

My question is: is there any technique (for example, using placeholders?) to recognize that the values of D are actually unused

这样的“转化”:

Result D = A*B+sin(C)+3.;
Result F = D*E;

到

Result F = (A*B+sin(C)+3.)*E;

当您不评估 D 时是可能的。为此，通常您应该捕获 D，因为它是真实的 expression 类型。例如，在 auto 的帮助下:

auto &&D = A*B+sin(C)+3.;
Result F = D*E;

但是，您应该小心 - 有时表达式模板会捕获对其操作数的引用，如果您有一些 rvalue 在它的表达式之后会过期:

auto &&D = A*get_large_rvalue();
// At this point, result of **get_large_rvalue** is destructed
// And D has expiried reference
Result F = D*E;

get_large_rvalue 是:

LargeMatrix get_large_rvalue();

它的结果是rvalue，它在调用get_large_rvalue 时在完整表达式结束时过期。如果表达式中的某些内容将存储指向它的指针/引用(供以后求值)，并且您将“推迟”求值 - 指针/引用将比指向/引用的对象长寿。

为了防止这种情况发生，你应该这样做:

auto &&intermediate = get_large_rvalue(); // it would live till the end of scope
auto &&D = A*intermediate ;
Result F = D*E;

I'm not familiar with C++11 but, as I understand, auto asks the compiler to determine the type of a variable from its initialization

是的，没错。这叫做 Type Inference/Deduction .

C++98/03 仅对模板函数进行类型推导，在 C++11 中有auto。

Do you know how do CUDA and C++11 interact each other?

我没有用过CUDA(虽然我用过OpenCL)，但我猜Host代码不会有任何问题使用 C++11。也许某些 C++11 功能在 Device 代码中不受支持，但出于您的目的 - 您只需要在 Host 代码中使用 auto

Finally, is there any possibility with only C++?

你是说 C++11 之前的版本吗？ IE。 C++98/C++03？是的，这是可能的，但它有更多的语法噪音，也许这就是拒绝它的理由:

// somehwhere
{
    use_D(A*B+sin(C)+3.);
}
// ...
template<typename Expression>
void use_D(Expression D) // depending on your expression template library
                         //   it may be better to use (const Expression &e)
{
    Result F = D*E;
}

I'm now using CUDA/Visual Studio 2010 under Windows. Could you please recommend a compiler/toolset/environment for both OS' to use C++11 in the framework of my interest (GPGPU and CUDA, in you know any)

MSVC 2010 确实支持 C++11 的某些部分。特别是它支持自动。因此，如果您只需要来自 C++11 的 auto - MSVC2010 就可以。

但如果您可能会使用 MSVC2012 - 我建议坚持使用它 - 它对 C++11 的支持要好得多。

Also, the trick auto &&intermediate = get_large_rvalue(); seems to be not "transparent" to a third party user (which is not supposed to know such an issue). Am I right? Any alternative?

如果表达式模板存储了对某些值的引用，并且您推迟了它的计算。您应该确保它的所有引用在评估位置都有效。使用任何你想要的方法 - 它可以在没有自动的情况下完成，比如:

LargeMatrix temp = get_large_rvalue();

或者甚至是全局/静态变量(不太受欢迎的方法)。

A last comment/question: to use auto &&D = A*B+sin(C)+3.; it seems that I should overload the operator= for assignments between two expressions, right?

不，这种形式既不需要复制/移动赋值运算符，也不需要复制/移动构造函数。

基本上它只是命名临时值，并将它的生命周期延长到作用域的末尾。 Check this SO .

但是，如果您要使用另一种形式:

auto D = A*B+sin(C)+3.;

在这种情况下，可能需要复制/移动/转换构造函数才能编译(尽管编译器可以使用 Copy Ellision 优化实际拷贝)

Also, switching between using auto (for the intermediate expressions) and Result to force calculation seems to be non-transparent to a third party user. Any alternative?

我不确定是否有其他选择。这是表达式模板的本质。当您在表达式中使用它们时 - 它们会返回一些内部中间类型，但是当您存储到某些“特殊”类型时 - 会触发评估。

关于c++ - 表达式模板 : improving performance in evaluating expressions?，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/15856122/

c++ - 表达式模板 : improving performance in evaluating expressions?

上一篇：c++ - 在 C++ 中哪个内存区域是 const 对象？

下一篇：c++ - 一旦可用就获取锁