c - 无法理解 __syncthreads()

In CUDA, a __syncthreads() statement , if present must be executed by all threads in a block . When a __syncthreads() is placed in an ifstatement ,either all threads in a block execute the path that includes the __syncthreads()or none of them does . For an if-then-else statement , if each path has a __syncthreads()statement , either all threads in a block execute the __syncthreads() on the then path or all of them execute the elsepath . The two __syncthreads() are different barrier synchronization points . If a thread in a block executes the then path and another executes the else path,they would be waiting at different barrier synchronization points . They would end up waiting for each other forever . It is the responsibility of the programmers to write their code so that these requirements are satisfied .

没有给出 ifif-else-then 情况的例子,所以我无法理解这个概念。请用简单的话向我解释这两种情况。

PS:我是并行编程和 CUDA 的初学者。



假设您有一个内核,该内核使用一个由 32 个线程组成的线程 block 启动。



__global__ void kernel()
  if (threadIdx.x < 16)
    // do something
    // do something

线程 block 的前 16 个线程将运行 if 语句。其他 16 个 else 语句。如果前 16 个线程中的每一个都到达 __syncthreads ,比它们阻塞直到整个线程 block 到达语句。但是这种情况永远不会出现,因为其他16个线程卡在了else分支,会出现死锁。

你应该避免使用 __syncthreads在不同的 if 和 else 分支中,否则您必须确保整个线程 block 在同一分支中运行!

