c++ - Eigen::Tensor 和 Eigen::Matrix 性能比较

我想用单个 3-D Eigen::Tensor 替换代码中的一系列矩阵。考虑到这一点，我尝试比较张量和矩阵的性能。

下面的函数“tensorContractTest”执行 (n,n,n) 3 阶张量与大小为 n (n = 500) 的 1 阶张量的收缩。此收缩计算 n**2 点积，因此就运算次数而言，它相当于两个 (n,n) 矩阵的乘法(下面的函数“matrixProductTest”)。

在 Visual Studio 2013 上运行时，函数“tensorContractTest”的运行速度比“matrixProductTest”慢约 40 倍。也许，我错过了一些东西。感谢帮助。

#include <unsupported/Eigen/CXX11/Tensor>
using namespace Eigen;

// Contracts 3-dimensional (n x n x n) tensor with 1-dimensional (n) tensor. 
// By the number of operations, it's equivalent to multiplication of 
// two (n, n) matrices (matrixProdTest).
Tensor<double, 2>  tensorContractTest(int n)
{
  Tensor<double, 3> a(n, n, n);     a.setConstant(1.);  
  Tensor<double, 1> b(n);           b.setConstant(1.);
  auto indexPair            = array<IndexPair<int>, 1>{IndexPair<int>(2,0)}; 
  Tensor<double, 2> result  = a.contract(b, indexPair); 
  return result;
}

MatrixXd  matrixProductTest(int n)
{ 
  MatrixXd a = MatrixXd::Ones(n, n), result = a * a;
  return result;
}

最佳答案

尽管浮点运算的数量相同，但内存访问模式完全不同，因此这两个运算根本没有可比性。一般来说，矩阵-矩阵运算总是比矩阵- vector 或 vector - vector 运算更快(就 FLOPS 而言)，因为前者可以更好地使用缓存，从而实现 CPU ALU 的近乎最佳使用。在您的情况下，一方面您必须读取一个 n^3 张量与两个 n^2 矩阵，因此内存占用根本不具有可比性。

在内部，Tensor::contract 在可能的情况下回退到 Eigen 的矩阵乘积内核，因此性能应该是成对的。

关于c++ - Eigen::Tensor 和 Eigen::Matrix 性能比较，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/48105543/

c++ - Eigen::Tensor 和 Eigen::Matrix 性能比较

上一篇：Docker-compose:用生产中预构建的镜像替换基于 "build"的服务？

下一篇：xcode - 从 mac AppStore 安装 Xcode 时，DMG 在哪里？