我有几个关于 #pragma omp for schedule(static)
的问题其中未指定块大小。
在 OpenMP 中并行化循环的一种方法是手动执行如下操作:
#pragma omp parallel
{
const int nthreads = omp_get_num_threads();
const int ithread = omp_get_thread_num();
const int start = ithread*N/nthreads;
const int finish = (ithread+1)*N/nthreads;
for(int i = start; i<finish; i++) {
//
}
}
是否有充分的理由不在 OpenMP 中手动并行化这样的循环? 如果我将这些值与
#pragma omp for schedule(static)
进行比较我看到给定线程的块大小并不总是一致,因此 OpenMP(在 GCC 中)正在实现与 start
中定义的不同的卡盘大小和 finish
.为什么是这样?start
和 finish
我定义的值有几个方便的属性。处理迭代 1-50 和第二个线程 51-100 而不是相反)。
编辑:原来我说的正好是一个块,但经过考虑,如果线程数远大于
N
,则块的大小可能为零。 ( ithread*N/nthreads = (ithread*1)*N/nthreads
)。我真正想要的属性(property)最多是一块。使用
#pragma omp for schedule(static)
时是否保证所有这些属性? 根据 OpenMP 规范:
Programs that depend on which thread executes a particular iteration under any other circumstances are non-conforming.
和
Different loop regions with the same schedule and iteration count, even if they occur in the same parallel region, can distribute ite rations among threads differently. The only exception is for the static schedule
对于
schedule(static)
规范说:chunks are assigned to the threads in the team in a round-robin fashion in the order of the thread number.
此外,规范对`schedule(static) 说:
When no chunk_size is specified, the iteration space is divided into chunks that are approximately equal in size, and at most one chunk is distributed to each thread.
最后,规范说
schedule(static)
:A compliant implementation of the static schedule must ensure that the same assignment of logical iteration numbers to threads will be used in two loop regions if the following conditions are satisfied: 1) both loop regions have the same number of loop iterations, 2) both loop regions have the same value of chunk_size specified, or both loop regions have no chunk_size specified, 3) both loop regions bind to the same parallel region.
所以如果我读对了
schedule(static)
将具有与我列为 start
相同的便利属性和 finish
即使我的代码依赖于线程执行特定的迭代。 我是否正确解释了这一点? 这似乎是 schedule(static)
的特例当未指定块大小时。更容易定义
start
和 finish
就像我当时所做的那样,尝试中断此案例的规范。
最佳答案
是否有充分的理由不在 OpenMP 中手动并行化这样的循环?
我想到的第一件事:
schedule(static)
. 将#pragam omp 用于调度(静态)时,是否保证所有这些属性?
让我们一一看:
1.) 每个线程只得到一个块
When no chunk_size is specified, the iteration space is divided into chunks that are approximately equal in size, and at most one chunk is distributed to each thread. Note that the size of the chunks is unspecified in this case.
最多一个块 不是 正好是一大块。所以属性一不满足。除此之外,这就是未指定块大小的原因。
2.) 迭代值的范围直接随线程数增加(即对于两个线程的 100 次迭代,第一个线程将
处理迭代 1-50 和第二个线程 51-100 而不是
反过来)
A compliant implementation of the static schedule must ensure that the same assignment of logical iteration numbers to threads will be used in two loop regions if the following conditions are satisfied:
- both loop regions have the same number of loop iterations
- both loop regions have the same value of chunk_size specified, or both loop regions have no chunk_size specified
- both loop regions bind to the same parallel region.
A data dependence between the same logical iterations in two such loops is guaranteed to be satisfied allowing safe use of the nowait clause (see Section A.10 on page 182 for examples).
尽管我从未见过与您所说的不同的东西,但我敢说即使是属性二也没有实现,至少对于
schedule(static)
来说没有实现。 .在我看来,在某个基数的迭代空间中,唯一的保证是如果遵守条件 1、2 和 3,相同的“逻辑迭代数”将被赋予同一个线程。如果您指定块大小,它确实被授予:
When schedule(static, chunk_size) is specified, iterations are divided into chunks of size chunk_size, and the chunks are assigned to the threads in the team in a round-robin fashion in the order of the thread number.
3.) 对于完全相同范围内的两个 for 循环,每个线程将运行完全相同的迭代
这确实是被授予的,而且更为普遍:对于具有相同基数的迭代空间的两个循环,将为每个线程提供相同的“逻辑迭代数”。
Example A.10.2c
OpenMP 3.1 标准应该澄清这一点。
关于未指定 block 大小的 OpenMP 调度(静态) : chunk size and order of assignment,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/18746282/