c++ - 嵌套的 openMP 并行化结合 std::thread

你好，StackOverFlowers，

我目前正在图像处理领域开展一个更大的项目。我正在使用 Visual Studio 2013 进行开发(不可协商)。不打扰你任何进一步的细节，这是我的问题:

我有两个必须并行运行的操作:

线性方程组的迭代求解(使用1-2个线程)
涉及图像到图像配准的相当复杂的过程。 (使用所有剩余线程)

为了知道哪些图像需要配准，需要线性方程组的近似解。因此他们需要同时运行。 (感谢 Z 玻色子指出缺少此信息)。迭代解决方案不断运行，并在每次成功注册图像后得到通知。

代码将在 24 核系统上运行。

目前，图像配准是使用 openMP 和“#pragma omp parallel for”实现的。迭代解决方案正在使用 std::thread 开始，还在内部使用 openMP“#pragma omp parallel for”。

现在我知道，根据the omp documentation找到嵌套并行性的 omp-thread 将使用其线程组来执行代码。但我认为这在我的情况下不起作用，因为它是一个 std::thread 启动第二个 omp-parallelism。

为了更好地理解这里是一个示例代码:

int main()
{
    std::thread * m_Thread = new std::thread(&IterativeSolution);

    #pragma omp parallel for
    for(int a = 0; a < 100; a++)
    {
        int b = GetImageFromApproximateSolution();
        RegisterImages(a,b);
        // Inform IterativeSolution about result of registration
    }
}

void IterativeSolution()
{
    #pragma omp parallel for
    for(int i = 0; i < 2; i++)
    {
        //SolveColumn(i);
    }
}
void RegisterImage(int a, int b)
{
    // Do Registration
}

此时我的问题是:上面的代码会创建太多线程吗？如果是这样，下面的代码是否可以解决问题？

int main()
{
    // The max is to avoid having less than 1 thread
    int numThreads = max(omp_get_max_threads() - 2, 1); 

    std::thread * m_Thread = new std::thread(&IterativeSolution);

    #pragma omp parallel for num_threads(numThreads)
    for(int a = 0; a < 100; a++)
    {
        int b = GetImageFromApproximateSolution();
        RegisterImages(a,b);
        // Inform IterativeSolution about result of registration
    }
}

void IterativeSolution()
{
    #pragma omp parallel for num_threads(2)
    for(int i = 0; i < 2; i++)
    {
        //SolveColumn(i);
    }
}
void RegisterImage(int a, int b)
{
    // Do Registration
}

最佳答案

根据 OpenMP 标准，这会产生未定义的行为。我测试过的大多数实现将在第一个示例中为这两个并行区域中的每一个创建 24 个线程，总共 48 个。第二个示例不应创建太多线程，但由于它依赖于未定义的行为，因此它可能会因崩溃而做任何事情在没有警告的情况下将您的计算机变成果冻状物质。

由于您已经在使用 OpenMP，我建议通过简单地删除 std::thread 并改用嵌套的 OpenMP 并行区域来使其符合 OpenMP 标准。您可以这样做:

int main()
{
    // The max is to avoid having less than 1 thread
    int numThreads = max(omp_get_max_threads() - 2, 1); 
    #pragma omp parallel num_threads(2)
    {
        if(omp_get_thread_num() > 0){
            IterativeSolution();
        }else{
            #pragma omp parallel for num_threads(numThreads)
            for(int a = 0; a < 100; a++)
            {
                int b = GetImageFromApproximateSolution();
                RegisterImages(a,b);
                // Inform IterativeSolution about result of registration
            }
        }
    }
}

void IterativeSolution()
{
    #pragma omp parallel for num_threads(2)
    for(int i = 0; i < 2; i++)
    {
        //SolveColumn(i);
    }
}
void RegisterImage(int a, int b)
{
    // Do Registration
}

您可能需要将环境变量定义 OMP_NESTED=true 和 OMP_MAX_ACTIVE_LEVELS=2 或更多添加到您的环境中以启用嵌套区域。这个版本的优点是完全在 OpenMP 中定义，并且应该可以在任何支持嵌套并行区域的环境中移植。如果您的版本不支持嵌套的 OpenMP 并行区域，那么您建议的解决方案可能是剩下的最佳选择。

关于c++ - 嵌套的 openMP 并行化结合 std::thread，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/24990798/

c++ - 嵌套的 openMP 并行化结合 std::thread

上一篇：c++ - Opencv C++ 图像导出到 C#

下一篇：c++ - 为 Pimpl 使用不透明指针的更好方法