opencl - float VS float N

在 OpenCL 中使用 floatN 而不是 float 有什么优势吗？

例如

float3 position;

和

float posX, posY, posZ;

谢谢

最佳答案

这取决于硬件。

NVidia GPU 具有标量架构，因此与编写纯标量代码相比，向量几乎没有优势。引用 NVidia OpenCL best practices guide (PDF链接):

The CUDA architecture is a scalar architecture. Therefore, there is no performance benefit from using vector types and instructions. These should only be used for convenience. It is also in general better to have more work-items than fewer using large vectors.

使用 CPU 和 ATI GPU，您将从使用向量中获得更多好处，因为这些架构具有向量指令(尽管我听说这在最新的 Radeon 上可能有所不同 - 希望我有一个链接到我阅读这篇文章的地方)。

引用 ATI Stream OpenCL programming guide (PDF 链接)，对于 CPU:

The SIMD floating point resources in a CPU (SSE) require the use of vectorized types (float4) to enable packed SSE code generation and extract good performance from the SIMD hardware.

This article提供了使用向量与纯标量类型编写的内核的 ATI GPU 的性能比较。

关于opencl - float VS float N，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/8933604/

上一篇：wcf - 将 Ninject 与 WebServiceHost 一起使用的正确方法是什么

下一篇：ant - Apache Ant 在 concat 作业中为每个文件添加页眉和页脚

相关文章：

opencl - 为什么全局内存分配成功，大小超过 GPU 限制？

c++ - 如何在 OSX 上使用 glfw3 在 OpenGL 和 OpenCL 之间创建共享上下文？

c++ - Xcode 拒绝构建我的一个 OpenCL 项目，但成功构建了另一个

opencv - OpenCL无损视频压缩

opencl - 尝试将动态本地内存传递给内核时出现访问冲突

buffer - 如何有效地交换OpenCL内存缓冲区？

OpenCL:工作项、处理元素、NDRange

opencl - 我可以分配大于 CL_DEVICE_MAX_MEM_ALLOC_SIZE 的内存吗？

opencl - nVidia Tesla 卡有什么好处吗？

c - 在不使用任何库(如 cuda 或 opencl)的情况下对 gpu 进行编程？