c++ - 用于多个 3d 阵列的 MPI_Scatterv 和 MPI_Gatherv

我是编程新手，尤其是 MPI。我正在尝试将多个数组从根处理器分散到其他处理器，对这些数组执行一些操作然后收集数据，但它会将所有数据分散到所有处理器并且输出邻接矩阵不正确所以我假设这是因为我错误地使用了 scatterv 和/或 gatherv。我不确定我是否应该逐个元素地散布矩阵，或者是否有办法散布整个矩阵。如果您可以查看我的代码，我们将不胜感激。谢谢!

int rank, size;
MPI_Status status;
MPI_Datatype strip;
bool passflag[Nmats];


MPI::Init();
rank = MPI::COMM_WORLD.Get_rank();
size = MPI::COMM_WORLD.Get_size();
int sendcounts[size], recvcounts, displs[size], rcounts[size];

if(rank == root){

    fin.open(infname);
    fout.open(outfname);
    /* INPUT ADJ-MATS */
    for(i = 0; i < Nmats; i++){
        fin >> dummy;
        for (j = 0; j < N; j++){
            for (k = 0; k < N; k++) {
                fin >> a[i][j][k];
            }
        }
    }
}
/* Nmats = Number of matrices; N = nodes; Nmats isn't divisible by the number of processors */

Nmin= Nmats/size;
Nextra = Nmats%size;
k=0;
for(i=0; i<size; i++){
    if( i < Nextra) sendcounts[i] = Nmin + 1;
    else sendcounts[i] = Nmin;
    displs[i] = k;
    k = k + sendcounts[i];
}
recvcounts = sendcounts[rank];
MPI_Type_vector(Nmin, N, N, MPI_FLOAT, &strip);
MPI_Type_commit(&strip);

MPI_Scatterv(a, sendcounts, displs, strip, a, N*N, strip, 0, MPI_COMM_WORLD);

/* Perform operations on adj-mats */

for(i=0; i<size; i++){
    if(i<Nextra) rcounts[i] = Nmin + 1;
    else rcounts[i] = Nextra;
    displs[i] = k;
    k = k + rcounts[i];

}


MPI_Gatherv(&passflag, 1, MPI::BOOL, &passflag, rcounts , displs, MPI::BOOL, 0, MPI_COMM_WORLD);

MPI::Finalize();
//OUTPUT ADJ_MATS
for(i = 0; i < Nmats; i++) if (passflag[i]) {
    for(j=0;j<N; j++){
        for(k=0; k<N; k++){
            fout << a[i][j][k] << " ";
        }
        fout << endl;
    }
    fout << endl;
}
fout << endl;

您好，我能够让代码为静态分配工作，但是当我尝试动态分配它时，代码或多或少地“崩溃”了。我不确定我是否需要在 MPI 之外分配内存，或者这是否是我在初始化 MPI 后应该做的事情。任何建议将不胜感激!

//int a[Nmats][N][N];

/* Prior to adding this part of the code it ran fine, now it's no longer working */ 
int *** a = new int**[Nmats];
for(i = 0; i < Nmats; ++i){
   a[i] = new int*[N];
   for(j = 0; j < N; ++j){
       a[i][j] = new int[N];
       for(k = 0; k < N; k++){
           a[i][j][k] = 0;
       }
           }
               } 

int rank, size;
MPI_Status status;
MPI_Datatype plane;
bool passflag[Nmats];


MPI::Init();
rank = MPI::COMM_WORLD.Get_rank();
size = MPI::COMM_WORLD.Get_size();
MPI_Type_contiguous(N*N, MPI_INT, &plane);
MPI_Type_commit(&plane);

int counts[size], recvcounts, displs[size+1];

if(rank == root){

fin.open(infname);   
fout.open(outfname);
    /* INPUT ADJ-MATS */
for(i = 0; i < Nmats; i++){         
  fin >> dummy;
  for (j = 0; j < N; j++){ 
          for (k = 0; k < N; k++) { 
                  fin >> a[i][j][k];                                              
                }
        }
  }

  } 


Nmin= Nmats/size;
Nextra = Nmats%size;
k=0;
for(i=0; i<size; i++){
   if( i < Nextra) counts[i] = Nmin + 1;
   else counts[i] = Nmin;
   displs[i] = k;
   k = k + counts[i];
}   
recvcounts = counts[rank];
displs[size] = Nmats;                        

MPI_Scatterv(&a[displs[rank]][0][0], counts, displs, plane, &a[displs[rank]][0][0],        recvcounts, plane, 0, MPI_COMM_WORLD);

/* Perform operations on matrices */

MPI_Gatherv(&passflag[displs[rank]], counts, MPI::BOOL, &passflag[displs[rank]], &counts[rank], displs, MPI::BOOL, 0, MPI_COMM_WORLD);

MPI_Type_free(&plane);  
MPI::Finalize();

最佳答案

看起来您在 a 中拥有的实际上是 Nmat 平面，每个平面包含 N x N 元素。在嵌套循环中填充其元素时索引 a 的方式表明这些矩阵在内存中是连续布局的。因此，您应该将 a 视为一个 Nmat 元素数组，每个元素都是一个 N*N 复合元素。您只需注册一个跨越单个矩阵内存的连续类型:

MPI_Type_contiguous(N*N, MPI_FLOAT, &plane);
MPI_Type_commit(&plane);

在不使用额外数组的情况下分散数据是使用分散操作的就地模式完成的:

// Perform an in-place scatter
if (rank == 0)
   MPI_Scatterv(a, sendcounts, displs, plane,
                MPI_IN_PLACE, 0, plane, 0, MPI_COMM_WORLD);
   //                         ^^^^^^^^ ignored because of MPI_IN_PLACE
else
   MPI_Scatterv(a, sendcounts, displs, plane,
   //           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^ ignored by non-root ranks
                a, sendcounts[rank], plane, 0, MPI_COMM_WORLD);
   //              ^^^^^^^^^^^^^^^^ !!!

请注意，每个等级必须通过提供来自 sendcounts[] 的相应元素(在您的代码中固定为 N*N).

在收集操作中也应该使用就地模式:

if (rank == 0)
   MPI_Gatherv(MPI_IN_PLACE, 0, MPI_BOOL,
   //                        ^^^^^^^^^^^^ ignored because of MPI_IN_PLACE
               passflag, rcounts, displs, MPI_BOOL, 0, MPI_COMM_WORLD);
else
   MPI_Gatherv(passflag, rcounts[rank], displs, MPI_BOOL,
   //                    ^^^^^^^^^^^^^ !!!
               passflag, rcounts, displs, MPI_BOOL, 0, MPI_COMM_WORLD);
   //          ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ ignored by non-root ranks

请注意，rcounts 和 sendcounts 具有基本相同的值，您不必计算它们两次。只需调用数组 counts 并在 MPI_Scatterv 和 MPI_Gatherv 调用中使用它。这同样适用于 displs 的值 - 不要计算它们两次，因为它们是相同的。在第二次计算之前，您似乎也没有将 k 设置为零(尽管这可能不会在此处发布的代码中显示)。

关于c++ - 用于多个 3d 阵列的 MPI_Scatterv 和 MPI_Gatherv，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/24633337/

c++ - 用于多个 3d 阵列的 MPI_Scatterv 和 MPI_Gatherv

上一篇：c++ - iOS:将 Qt 添加到现有的 Xcode 项目

下一篇：c++ - 信号处理程序中不允许使用对象或函数