cuda - CUDA 是否自动将 float4 数组转换为数组结构？

我有以下代码片段:

#include <stdio.h>

struct Nonsense {
    float3 group;
    float other;
};

__global__ void coalesced(float4* float4Array, Nonsense* nonsenseArray) {
    float4 someCoordinate = float4Array[threadIdx.x];
    someCoordinate.x = 5;
    float4Array[threadIdx.x] = someCoordinate;

    Nonsense nonsenseValue = nonsenseArray[threadIdx.x];
    nonsenseValue.other = 3;
    nonsenseArray[threadIdx.x] = nonsenseValue;
}

int main() {
    float4* float4Array;
    cudaMalloc(&float4Array, 32 * sizeof(float4));
    cudaMemset(float4Array, 32 * sizeof(float4), 0);

    Nonsense* nonsenseArray;
    cudaMalloc(&nonsenseArray, 32 * sizeof(Nonsense));
    cudaMemset(nonsenseArray, 32 * sizeof(Nonsense), 0);

    coalesced<<<1, 32>>>(float4Array, nonsenseArray);
    cudaDeviceSynchronize();
    return 0;
}

当我通过 Nsight 中的 Nvidia 分析器运行它并查看全局内存访问模式时，float4Array 具有完美的合并读写。同时，Nonsense 数组的访问模式很差(因为它是一个结构数组)。

NVCC 是否自动将概念上是结构数组的 float4 数组转换为数组结构以获得更好的内存访问模式？

最佳答案

不，它不会将其转换为数组结构。我想如果你仔细考虑一下，你会得出结论，编译器几乎不可能以这种方式重组数据。毕竟，传递的是一个指针。

只有一个数组，并且那个数组的元素仍然有相同顺序的struct元素:

float address (i.e. index):      0      1      2      3      4      5 ...
array element             : a[0].x a[0].y a[0].z a[0].w a[1].x a[1].y ...

然而，float4 数组提供了更好的模式，因为编译器会生成 a single 16-byte load per thread .这有时被称为“矢量加载”，因为我们正在为每个线程加载一个矢量(在本例中为 float4)。因此，相邻线程仍在读取相邻数据，并且您具有理想的合并行为。在上面的示例中，线程 0 将读取 a[0].x、a[0].y、a[0].z和 a[0].w，线程 1 将读取 a[1].x、a[1].y 等。所有这将发生在单个请求(即 SASS 指令)中，但可能会拆分为多个事务。将请求拆分为多个事务不会导致任何效率损失(在这种情况下)。

在 Nonsense 结构的情况下，编译器无法识别该结构也可以以类似的方式加载，因此在引擎盖下它必须为每个线程生成 3 或 4 次加载:

一个 8 字节加载(或两个 4 字节加载)加载 float3 组的前两个字
一个4字节加载加载float3组的最后一个字
一个 4 字节加载来加载 float other

如果您为每个线程绘制上述负载，也许使用上图，您会看到每个负载都涉及一个跨度(每个线程加载的项目之间未使用的元素)，因此导致效率较低。

通过在结构中使用谨慎的类型转换或联合定义，您可以让编译器在一次加载中加载您的 Nonsense 结构。

This answer还涵盖了与 AoS -> SoA 转换和相关效率提升相关的一些想法。

This answer涵盖矢量负载详细信息。

关于cuda - CUDA 是否自动将 float4 数组转换为数组结构？，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/53127392/

cuda - CUDA 是否自动将 float4 数组转换为数组结构？

上一篇：ruby-on-rails - Rails 的默认主页在哪里？

下一篇：regex - 错误 : Big Query REGEX_EXTRACT Cannot parse regular expression: invalid perl operator: (? <