c - 无法理解 __syncthreads()

书中引述:

In CUDA, a __syncthreads() statement , if present must be executed by all threads in a block . When a __syncthreads() is placed in an ifstatement ,either all threads in a block execute the path that includes the __syncthreads()or none of them does . For an if-then-else statement , if each path has a __syncthreads()statement , either all threads in a block execute the __syncthreads() on the then path or all of them execute the elsepath . The two __syncthreads() are different barrier synchronization points . If a thread in a block executes the then path and another executes the else path,they would be waiting at different barrier synchronization points . They would end up waiting for each other forever . It is the responsibility of the programmers to write their code so that these requirements are satisfied .

没有给出 if 和 if-else-then 情况的例子，所以我无法理解这个概念。请用简单的话向我解释这两种情况。

PS:我是并行编程和 CUDA 的初学者。

提前致谢。

最佳答案

假设您有一个内核，该内核使用一个由 32 个线程组成的线程 block 启动。

kernel<<<1,32>>>()

内核代码如下:

__global__ void kernel()
{
  if (threadIdx.x < 16)
  {
    // do something
    __syncthreads();
  }
  else
  {
    // do something
    __snycthreads();
  }
}

线程 block 的前 16 个线程将运行 if 语句。其他 16 个 else 语句。如果前 16 个线程中的每一个都到达 __syncthreads ，比它们阻塞直到整个线程 block 到达语句。但是这种情况永远不会出现，因为其他16个线程卡在了else分支，会出现死锁。

你应该避免使用 __syncthreads在不同的 if 和 else 分支中，否则您必须确保整个线程 block 在同一分支中运行!

关于c - 无法理解 __syncthreads()，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/20613460/

c - 无法理解 __syncthreads()

上一篇：c - 如何在 C 中声明足够大的缓冲区？

下一篇：c - 在结构中分配二维数组的值