c - 无法使用 MPI 数据类型接收数组的子集

我在发送和接收二维数组的列时遇到问题。

我有2个进程。第一个进程有一个二维数组，我想将它的一部分发送到第二个进程。所以说每个等级都有一个 9x9 数组，我希望等级 0 发送到等级 1 只是某些列:

例子:

-1--2--3-
-2--3--4-
-5--6--7-
...

我想发送“1,2,5,...”和“3,4,7,...”。

我已经编写了代码来发送第一列，并且我已经通读了 this answer我相信我已经为该列正确定义了 MPI_Type_vector:

MPI_Type_vector(dime,1,dime-1,MPI_INT,&LEFT_SIDE);

哪里dime这里，9，是数组的大小；我发送了 1 个 MPI_INT 的 9 个块，每个块以 8 的步幅分隔 - 但即使只发送这一列也会给我无效的结果。

我的代码如下:

#include <mpi.h>
#include <stdio.h>
#include <stdlib.h>

#define dime 9

int main (int argc, char *argv[])
{
    int size,rank;
    const int ltag=2;

    MPI_Init(&argc,&argv);
    MPI_Comm_size(MPI_COMM_WORLD, &size);       // Get the number of processes
    MPI_Comm_rank(MPI_COMM_WORLD, &rank);       // Get the rank of the process

    int table[dime][dime];
    for (int i=0; i<dime; i++)
        for (int j=0; j<dime; j++)
            table[i][j] = rank;

    int message[dime];

    MPI_Datatype LEFT_SIDE;
    MPI_Type_vector(dime,1,dime-1,MPI_INT,&LEFT_SIDE);
    MPI_Type_commit(&LEFT_SIDE);

    if(rank==0) {
        MPI_Send(table, 1, LEFT_SIDE, 1, ltag, MPI_COMM_WORLD);
    } else if(rank==1){
        MPI_Status status;
        MPI_Recv(message, 1, LEFT_SIDE, 0, ltag, MPI_COMM_WORLD, &status);
    }

    if(rank == 1 ){
        printf("Rank 1's received data: ");

        for(int i=0;i<dime;i++)
            printf("%6d ",*(message+i));

        printf("\n");
    }

    MPI_Finalize();
    return 0;

}

但是当我运行它并查看我收到的数据时，我得到的要么是全零要么是乱码:

$ mpicc -o datatype datatype.c -Wall -g -O3 -std=c99 
$ mpirun -np 2 datatype
Rank 1's received data:      0  32710 64550200      0 1828366128  32765 11780096      0      0

每次数字都在变化的地方。我究竟做错了什么？

最佳答案

@Mort 的回答是正确的，并且是第一个；我只想用一些 ASCII 艺术图来扩展它，以尝试将他的信息带回家。

MPI 数据类型描述了数据在内存中的布局方式。让我们看看你的二维数组，以获得更小的 dime (比如 4)和相应的 MPI_Type_vector:

 MPI_Type_vector(count=dime, blocksize=1, stride=dime-1, type=MPI_INT ...
                      = 4             =1        = 3

 data = { 0, 1, 2, 3, 4, 5, 6, 7, 8, 9,10,11,12,13,14,15 };
 Vector:  X  -  -  X  -  -  X  -  -  X -  -

请注意，MPI 类型中的步幅是类型开始之间的距离，而不是它们之间的间隙大小；所以你实际上想要stride=dime，而不是dime-1。这很容易解决，但不是实际问题:

 MPI_Type_vector(count=dime, blocksize=1, stride=dime, type=MPI_INT ...
                      = 4             =1        = 4

 data = { 0, 1, 2, 3, 4, 5, 6, 7, 8, 9,10,11,12,13,14,15 };
 Vector:  X  -  -  -  X  -  -  -  X  -  -  -  X -  -  -

好的，到目前为止一切顺利，我们正在选择正确的元素。但是我们没有正确接收它们；尝试使用相同布局将数据接收到大小为一角的数组中的代码:

int message[dime];
MPI_Recv(message, 1, LEFT_SIDE, 0, ...

message = { 0, 1, 2, 3 };
Vector:     X  -  -  -  X  -  -  -  X  -  -  -  X -  -  -

vector 远远超出消息范围，这 (a) 在消息中留下未初始化的数据，这是乱码的来源，以及 (b) 可能导致超出数组边界的段错误。

至关重要的是，这些 MPI_Type_vectors 之一描述了 2d 矩阵中所需数据的布局，但不描述接收到紧凑 1d 数组中的相同数据的布局。

这里有两个选择。要么接收数据到message数组简单地为 dime x MPI_INT :

// ....
} else if(rank==1){
    MPI_Status status;
    MPI_Recv(message, dime, MPI_INT, 0, ltag, MPI_COMM_WORLD, &status);
}

//...

$ mpirun -np 2 datatype
Rank 1's received data:      0      0      0      0      0      0      0      0      0

或者直接将数据直接接收到 Rank 1 上的 2d 矩阵中，覆盖相应的列:

#include <mpi.h>
#include <stdio.h>
#include <stdlib.h>

#define dime 9

int main (int argc, char *argv[])
{
    int size,rank;
    const int ltag=2;

    MPI_Init(&argc,&argv);
    MPI_Comm_size(MPI_COMM_WORLD, &size);       // Get the number of processes
    MPI_Comm_rank(MPI_COMM_WORLD, &rank);       // Get the rank of the process

    int table[dime][dime];
    for (int i=0; i<dime; i++)
        for (int j=0; j<dime; j++)
            table[i][j] = rank;

    MPI_Datatype LEFT_SIDE;
    MPI_Type_vector(dime,1,dime,MPI_INT,&LEFT_SIDE);
    MPI_Type_commit(&LEFT_SIDE);

    if(rank==0) {
        MPI_Send(table, 1, LEFT_SIDE, 1, ltag, MPI_COMM_WORLD);
    } else if(rank==1){
        MPI_Status status;
        MPI_Recv(table, 1, LEFT_SIDE, 0, ltag, MPI_COMM_WORLD, &status);
    }

    if(rank == 1 ){
        printf("Rank 1's new array:\n");

        for(int i=0;i<dime;i++) {
            for(int j=0;j<dime;j++) 
                printf("%6d ",table[i][j]);
            printf("\n");
        }

        printf("\n");
    }

    MPI_Type_free(&LEFT_SIDE);
    MPI_Finalize();
    return 0;

}

运行给予

$ mpicc -o datatype datatype.c -Wall -g -O3 -std=c99 
$ mpirun -np 2 datatype
Rank 1's new array:
     0      1      1      1      1      1      1      1      1 
     0      1      1      1      1      1      1      1      1 
     0      1      1      1      1      1      1      1      1 
     0      1      1      1      1      1      1      1      1 
     0      1      1      1      1      1      1      1      1 
     0      1      1      1      1      1      1      1      1 
     0      1      1      1      1      1      1      1      1 
     0      1      1      1      1      1      1      1      1 
     0      1      1      1      1      1      1      1      1

(更正 MPI_Type_vector 后)

关于如何将其扩展到多列的其余部分可能最好留给另一个问题。

关于c - 无法使用 MPI 数据类型接收数组的子集，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/31977456/

c - 无法使用 MPI 数据类型接收数组的子集

上一篇：qt - Qt5 和 Qt 4.8 中的模型和角色

下一篇：ironpython - 代码执行速度 : IronPython vs C#?