c++ - 使用内联的显式模板函数实例化

因此，我和一位同事一直在讨论显式模板实例化在减少编译时间、将声明与定义分开以及不影响我编写的用于其他项目的 C++ 数学库的性能方面的好处。

本质上，我有一个有用的数学函数库，旨在与 Vector3、Vector4、Quaternion 等原语一起使用。所有这些都用于模板参数为 float 或 double(在某些情况下为 int) .

这样我就不必将这些函数写两次，一次用于 float ，一次用于 double ，函数实现是模板化的，如下所示:

template<typename T>
Vector3<T> foo(const Vector4<T>& a, 
               const Quaternion<T>& b) 
{ do something... }

全部在 .h 文件中定义(因此它们被隐式标记为内联)。这些函数大部分都很短，希望在使用编译时内联。

不过， header 变得越来越大，编译时间也在增加，并且越来越难以仅通过浏览 header 来找到函数的存在(这是我喜欢将声明与实现分开的众多原因之一)。

所以我可以在随附的 .cpp 文件中使用显式模板实例化，如下所示:

  //in .h
  template<typename T>
  Vector3<T> foo(const Vector4<T>& a, 
                 const Quaternion<T>& b) 
  { do something... }

  //in .cpp
  template Vector3<float> foo<float>(const Vector4<float>& a, 
                                     const Quaternion<float>& b);
  template Vector3<double> foo<double>(const Vector4<double>& a, 
                                       const Quaternion<double>& b);

这应该有助于编译时间？这会影响内联函数的可能性吗？这些问题中的任何一个的答案通常都是特定于编译器的吗？

一个额外的好处是它确实验证函数编译，即使我还没有使用它。

我也可以这样做:

  //in .h
  template<typename T>
  Vector3<T> foo(const Vector4<T>& a, 
                 const Quaternion<T>& b);

  //in .cpp
  template<typename T>
  Vector3<T> foo(const Vector4<T>& a, 
                 const Quaternion<T>& b) 
  { do something... }

  template Vector3<float> foo<float>(const Vector4<float>& a, 
                                     const Quaternion<float>& b);
  template Vector3<double> foo<double>(const Vector4<double>& a, 
                                       const Quaternion<double>& b);

该方法的相同问题:

这应该有助于编译时间？这会影响内联函数的可能性吗？这些问题中的任何一个的答案通常都是特定于编译器的吗？

考虑到定义不在 header 中，我预计内联的可能性肯定会受到影响。

它很好地分离了模板化函数的声明和定义(针对特定的模板参数)，而无需诉诸于使用包含在 .h 文件底部的 .inl 之类的操作。这也对库的用户隐藏了实现，这是有益的(但还不是绝对必要的)，同时仍然能够使用模板，所以我不必实现一个函数 N 次。

有没有办法通过调整方法来允许内联？

我发现很难通过谷歌搜索这些问题的答案，并且很难理解这些主题的标准规范(至少对我而言)。

顺便说一句，这预计将使用 VS2010、VS2012 和 GCC 4.7 进行编译。

如有任何帮助，我们将不胜感激。

谢谢

最佳答案

我假设您的技术旨在与这个问题的答案相同:Template instantiation effect on compile duration

要达到预期的结果，您还需要通过使用 extern 在 header 中声明显式实例化来防止自动实例化。参见 Explicit instantiation declaration with extern

//in .h
template<typename T>
Vector3<T> foo(const Vector4<T>& a, 
               const Quaternion<T>& b);

extern template Vector3<float> foo<float>(const Vector4<float>& a, 
                                          const Quaternion<float>& b);

extern template Vector3<double> foo<double>(const Vector4<double>& a, 
                                            const Quaternion<double>& b);

//in .cpp
template<typename T>
Vector3<T> foo(const Vector4<T>& a, 
               const Quaternion<T>& b) 
{ /* do something...*/ }

template Vector3<float> foo<float>(const Vector4<float>& a, 
                                   const Quaternion<float>& b);
template Vector3<double> foo<double>(const Vector4<double>& a, 
                                     const Quaternion<double>& b);

This should aid with compile times? Would this affect the possibility the possibility of the functions being inlined? Are the answers to either of those questions generally compiler specific?

答案在很大程度上取决于编译器 - 应该根据经验更准确地确定 - 但我们可以对其进行概括。

我们可以假设编译时间的增加不是来自解析额外的模板尖括号语法的成本，而是来自模板实例化的(复杂)过程的成本。如果是这种情况，那么在多个翻译单元中使用给定模板专门化的成本应该会显着增加编译时间，前提是实例化很昂贵并且编译器会多次执行实例化。

C++ 标准隐含地允许编译器在所有翻译单元中对每个唯一的模板特化执行一次实例化。也就是说，模板函数的实例化可以延迟并在初始编译后执行，如 Comeau 中所述。文档。是否实现此优化取决于编译器，但肯定不会在 2015 年之前的任何版本的 MSVC 中实现。

如果您的编译器在链接时执行实例化，如果编译器不支持跨模块内联，则此技术将阻止内联。较新版本的 MSVC、GCC 和 Clang 都支持在链接时使用附加链接器选项(LTCG 或 LTO)进行跨模块内联。参见 Can the linker inline functions?

关于c++ - 使用内联的显式模板函数实例化，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/26392876/

c++ - 使用内联的显式模板函数实例化

上一篇：c++ - 在没有 Malloc/New 或 Free/Delete 的情况下管理连续的内存块

下一篇：c++ - 为什么空字符串可以在 C++ 中输出索引 0 元素