cuda - 为什么CUDA中没有简单的原子自增、自减操作？

来自CUDA Programming guide :

unsigned int atomicInc(unsigned int* address,
                       unsigned int val);
reads the 32-bit word old located at the address address in global or shared memory, computes ((old >= val) ? 0 : (old+1)), and stores the result back to memory at the same address. These three operations are performed in one atomic transaction. The function returns old.

这真是又漂亮又花花公子。但在哪里

unsigned int atomicInc(unsigned int* address);

它只是增加地址处的值并返回旧值？并且

void atomicInc(unsigned int* address);

它只是增加地址处的值并且不返回任何内容？

注意:当然，我可以通过包装实际的 API 调用来“推出我自己的”，但我认为硬件的操作更简单，可能更便宜。

最佳答案

他们没有实现简单的增加和减少操作，因为这些操作不会有更好的性能。当前架构中的每个机器代码指令占用相同的空间量，即 64 位。换句话说，指令中有足够的空间容纳完整的 32 位立即值，并且由于它们具有支持添加完整 32 位值的原子指令，因此它们已经耗尽了晶体管。

我认为旧处理器上的专用 inc 和 dec 指令现在只是晶体管昂贵得多且指令缓存很小的时代的产物，从而使得将指令编码为尽可能少的位是值得的。我的猜测是，在新的 CPU 上，inc 和 dec 指令是根据内部更通用的加法函数实现的，并且主要是为了向后兼容。

关于cuda - 为什么CUDA中没有简单的原子自增、自减操作？，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/19931815/

cuda - 为什么CUDA中没有简单的原子自增、自减操作？

上一篇：计算C中耗时

下一篇：perl - 如何提取两个字符串之间的行