c - 使用MPI的不同大小的散布矩阵 block

标签 c parallel-processing mpi distributed-computing message-passing

(假设所有的矩阵都按行的主顺序存储。)一个例子说明了这个问题:在3x3网格上分布一个10x10的矩阵,这样每个节点中的子矩阵的大小看起来像

|-----+-----+-----|
| 3x3 | 3x3 | 3x4 |
|-----+-----+-----|
| 3x3 | 3x3 | 3x4 |
|-----+-----+-----|
| 4x3 | 4x3 | 4x4 |
|-----+-----+-----|

我在stackoverflow上见过很多帖子(比如sending blocks of 2D array in C using MPIMPI partition matrix into blocks)。但它们只处理相同大小的块(在这种情况下,我们可以简单地使用MPI_Type_vectorMPI_Type_create_subarray并且只使用一个MPI_Scatterv调用)。
所以,我想知道在MPI中,将矩阵分散到每个处理器都有一个指定大小的块的处理器网格中最有效的方法是什么。
另外,我还研究了MPI_Type_create_darray,但它似乎不允许您为每个处理器指定块大小。

最佳答案

要做到这一点,必须在mpi中至少多经历一步。
问题是,最常见的聚集/散布例程MPI_ScattervMPI_Gatherv允许传递计数/位移的“向量”(v),而不是仅传递一个散射和聚集计数,但假设所有类型都相同。在这里,没有办法绕过它;每个块的内存布局是不同的,因此必须用不同的类型来处理。如果块之间只有一个区别(有些块的列数不同,有些块的行数不同),那么只使用不同的计数就足够了。但是对于不同的列和行,计数并不能做到这一点;您确实需要能够指定不同的类型。
因此,您真正想要的是一个经常讨论但从未实现的mpi_scatterw(其中w表示vv;例如,计数和类型都是向量)例程。但这样的事情是不存在的。最接近的是更通用的MPI_Alltoallw调用,它允许对所有数据的发送和接收进行完全通用;正如规范所述,"The MPI_ALLTOALLW function generalizes several MPI functions by carefully selecting the input arguments. For example, by making all but one process have sendcounts(i) = 0, this achieves an MPI_SCATTERW function."
因此,您可以使用mpi_alltoallw来实现这一点,方法是让所有进程(除了最初拥有所有数据的进程(我们假设它在这里排名为0))将它们的所有发送计数发送到零。除第一个任务外,所有任务的接收计数都将为零,第一个任务的接收计数是从排名为零的任务中获得的数据量。
对于进程0的发送计数,我们首先必须定义四种不同的类型(4种不同大小的子阵列),然后发送计数都是1,剩下的唯一部分是计算发送位移(与scatterv不同,这里是以字节为单位的,因为没有一种类型可以用作一个单位):

        /* 4 types of blocks - 
         * blocksize*blocksize, blocksize+1*blocksize, blocksize*blocksize+1, blocksize+1*blocksize+1
         */

        MPI_Datatype blocktypes[4];
        int subsizes[2];
        int starts[2] = {0,0};
        for (int i=0; i<2; i++) {
           subsizes[0] = blocksize+i;
           for (int j=0; j<2; j++) {
               subsizes[1] = blocksize+j;
               MPI_Type_create_subarray(2, globalsizes, subsizes, starts, MPI_ORDER_C, MPI_CHAR, &blocktypes[2*i+j]);
               MPI_Type_commit(&blocktypes[2*i+j]);
           }
        }

        /* now figure out the displacement and type of each processor's data */
        for (int proc=0; proc<size; proc++) {
            int row, col;
            rowcol(proc, blocks, &row, &col);

            sendcounts[proc] = 1;
            senddispls[proc] = (row*blocksize*globalsizes[1] + col*blocksize)*sizeof(char);

            int idx = typeIdx(row, col, blocks);
            sendtypes[proc] = blocktypes[idx];
        }
    }

    MPI_Alltoallw(globalptr, sendcounts, senddispls, sendtypes,
                  &(localdata[0][0]), recvcounts, recvdispls, recvtypes, 
                  MPI_COMM_WORLD);

这样就行了。
但问题是alltoallw函数是如此的通用,以至于实现很难在优化过程中做很多事情;所以如果它能像分散大小相同的块一样执行的话,我会感到惊讶。
所以另一种方法是做一些类似于两个阶段的交流。
在注意到您几乎可以通过一个MPI_Scatterv()调用获得所需的所有数据之后,下面是最简单的方法:在您的示例中,如果我们使用一个列向量的单位进行操作,列=1,行=3(域中大多数块中的行数),您可以将几乎所有全局数据分散到其他处理器。每个处理器都得到这些向量中的3或4个,这些向量分布所有的数据,除了全局数组的最后一行,后者可以由一个简单的第二散射体处理。看起来像这样;
/* We're going to be operating mostly in units of a single column of a "normal" sized block.
 * There will need to be two vectors describing these columns; one in the context of the
 * global array, and one in the local results.
 */
MPI_Datatype vec, localvec;
MPI_Type_vector(blocksize, 1, localsizes[1], MPI_CHAR, &localvec);
MPI_Type_create_resized(localvec, 0, sizeof(char), &localvec);
MPI_Type_commit(&localvec);

MPI_Type_vector(blocksize, 1, globalsizes[1], MPI_CHAR, &vec);
MPI_Type_create_resized(vec, 0, sizeof(char), &vec);
MPI_Type_commit(&vec);

/* The originating process needs to allocate and fill the source array,
 * and then define types defining the array chunks to send, and
 * fill out senddispls, sendcounts (1) and sendtypes.
 */
if (rank == 0) {
    /* create the vector type which will send one column of a "normal" sized-block */
    /* then all processors except those in the last row need to get blocksize*vec or (blocksize+1)*vec */
    /* will still have to do something to tidy up the last row of values */
    /* we need to make the type have extent of 1 char for scattering */
    for (int proc=0; proc<size; proc++) {
        int row, col;
        rowcol(proc, blocks, &row, &col);

        sendcounts[proc] = isLastCol(col, blocks) ? blocksize+1 : blocksize;
        senddispls[proc] = (row*blocksize*globalsizes[1] + col*blocksize);
    }
}

recvcounts = localsizes[1];
MPI_Scatterv(globalptr, sendcounts, senddispls, vec,
              &(localdata[0][0]), recvcounts, localvec, 0, MPI_COMM_WORLD);

MPI_Type_free(&localvec);
if (rank == 0)
    MPI_Type_free(&vec);

/* now we need to do one more scatter, scattering just the last row of data
 * just to the processors on the last row.
 * Here we recompute the send counts
 */
if (rank == 0) {
    for (int proc=0; proc<size; proc++) {
        int row, col;
        rowcol(proc, blocks, &row, &col);
        sendcounts[proc] = 0;
        senddispls[proc] = 0;

        if ( isLastRow(row,blocks) ) {
            sendcounts[proc] = blocksize;
            senddispls[proc] = (globalsizes[0]-1)*globalsizes[1]+col*blocksize;
            if ( isLastCol(col,blocks) )
                sendcounts[proc] += 1;
        }
    }
}

recvcounts = 0;
if ( isLastRow(myrow, blocks) ) {
    recvcounts = blocksize;
    if ( isLastCol(mycol, blocks) )
        recvcounts++;
}
MPI_Scatterv(globalptr, sendcounts, senddispls, MPI_CHAR,
              &(localdata[blocksize][0]), recvcounts, MPI_CHAR, 0, MPI_COMM_WORLD);

到现在为止,一直都还不错。但遗憾的是,在最后的“清理”过程中,大多数处理器无所事事。
因此,更好的方法是在第一阶段分散所有行,在第二阶段将数据分散到列中。在这里,我们创建新的通讯器,每个处理器都属于两个新的通讯器—一个表示同一块行中的其他处理器,另一个表示同一块列中的其他处理器。在第一步中,源处理器将全局数组的所有行分发给同一列通信器中的其他处理器—这可以在单个散射器中完成。然后,这些处理器使用一个scatterv和与上一个示例中相同的列数据类型,将列分散到与它在同一块行中的每个处理器。结果是两个相当简单的散射体分布了所有的数据:
/* create communicators which have processors with the same row or column in them*/
MPI_Comm colComm, rowComm;
MPI_Comm_split(MPI_COMM_WORLD, myrow, rank, &rowComm);
MPI_Comm_split(MPI_COMM_WORLD, mycol, rank, &colComm);

/* first, scatter the array by rows, with the processor in column 0 corresponding to each row
 * receiving the data */
if (mycol == 0) {
    int sendcounts[ blocks[0] ];
    int senddispls[ blocks[0] ];
    senddispls[0] = 0;

    for (int row=0; row<blocks[0]; row++) {
        /* each processor gets blocksize rows, each of size globalsizes[1]... */
        sendcounts[row] = blocksize*globalsizes[1];
        if (row > 0)
            senddispls[row] = senddispls[row-1] + sendcounts[row-1];
    }
    /* the last processor gets one more */
    sendcounts[blocks[0]-1] += globalsizes[1];

    /* allocate my rowdata */
    rowdata = allocchar2darray( sendcounts[myrow], globalsizes[1] );

    /* perform the scatter of rows */
    MPI_Scatterv(globalptr, sendcounts, senddispls, MPI_CHAR,
                  &(rowdata[0][0]), sendcounts[myrow], MPI_CHAR, 0, colComm);

}

/* Now, within each row of processors, we can scatter the columns.
 * We can do this as we did in the previous example; create a vector
 * (and localvector) type and scatter accordingly */
int locnrows = blocksize;
if ( isLastRow(myrow, blocks) )
    locnrows++;
MPI_Datatype vec, localvec;
MPI_Type_vector(locnrows, 1, globalsizes[1], MPI_CHAR, &vec);
MPI_Type_create_resized(vec, 0, sizeof(char), &vec);
MPI_Type_commit(&vec);

MPI_Type_vector(locnrows, 1, localsizes[1], MPI_CHAR, &localvec);
MPI_Type_create_resized(localvec, 0, sizeof(char), &localvec);
MPI_Type_commit(&localvec);

int sendcounts[ blocks[1] ];
int senddispls[ blocks[1] ];
if (mycol == 0) {
    for (int col=0; col<blocks[1]; col++) {
        sendcounts[col] = isLastCol(col, blocks) ? blocksize+1 : blocksize;
        senddispls[col] = col*blocksize;
    }
}
char *rowptr = (mycol == 0) ? &(rowdata[0][0]) : NULL;

MPI_Scatterv(rowptr, sendcounts, senddispls, vec,
              &(localdata[0][0]), sendcounts[mycol], localvec, 0, rowComm);

这更简单,应该是性能和健壮性之间的一个相对较好的平衡。
运行这三种方法都可以:
bash-3.2$ mpirun -np 6 ./allmethods alltoall
Global array:
abcdefg
hijklmn
opqrstu
vwxyzab
cdefghi
jklmnop
qrstuvw
xyzabcd
efghijk
lmnopqr
Method - alltoall

Rank 0:
abc
hij
opq

Rank 1:
defg
klmn
rstu

Rank 2:
vwx
cde
jkl

Rank 3:
yzab
fghi
mnop

Rank 4:
qrs
xyz
efg
lmn

Rank 5:
tuvw
abcd
hijk
opqr

bash-3.2$ mpirun -np 6 ./allmethods twophasevecs
Global array:
abcdefg
hijklmn
opqrstu
vwxyzab
cdefghi
jklmnop
qrstuvw
xyzabcd
efghijk
lmnopqr
Method - two phase, vectors, then cleanup

Rank 0:
abc
hij
opq

Rank 1:
defg
klmn
rstu

Rank 2:
vwx
cde
jkl

Rank 3:
yzab
fghi
mnop

Rank 4:
qrs
xyz
efg
lmn

Rank 5:
tuvw
abcd
hijk
opqr
bash-3.2$ mpirun -np 6 ./allmethods twophaserowcol
Global array:
abcdefg
hijklmn
opqrstu
vwxyzab
cdefghi
jklmnop
qrstuvw
xyzabcd
efghijk
lmnopqr
Method - two phase - row, cols

Rank 0:
abc
hij
opq

Rank 1:
defg
klmn
rstu

Rank 2:
vwx
cde
jkl

Rank 3:
yzab
fghi
mnop

Rank 4:
qrs
xyz
efg
lmn

Rank 5:
tuvw
abcd
hijk
opqr

实现这些方法的代码如下所示;您可以针对问题将块大小设置为更典型的大小,并在实际数量的处理器上运行,以了解哪个对您的应用程序最合适。
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include "mpi.h"

/* auxiliary routines, found at end of program */

char **allocchar2darray(int n, int m);
void freechar2darray(char **a);
void printarray(char **data, int n, int m);
void rowcol(int rank, const int blocks[2], int *row, int *col);
int isLastRow(int row, const int blocks[2]);
int isLastCol(int col, const int blocks[2]);
int typeIdx(int row, int col, const int blocks[2]);

/* first method - alltoallw */
void alltoall(const int myrow, const int mycol, const int rank, const int size, 
                    const int blocks[2], const int blocksize, const int globalsizes[2], const int localsizes[2],
                    const char *const globalptr,  char **localdata) {
    /*
     * get send and recieve counts ready for alltoallw call. 
     * everyone will be recieving just one block from proc 0; 
     * most procs will be sending nothing to anyone. 
     */
    int sendcounts[ size ];
    int senddispls[ size ];
    MPI_Datatype sendtypes[size];
    int recvcounts[ size ];
    int recvdispls[ size ];
    MPI_Datatype recvtypes[size];

    for (int proc=0; proc<size; proc++) {
        recvcounts[proc] = 0;
        recvdispls[proc] = 0;
        recvtypes[proc] = MPI_CHAR;

        sendcounts[proc] = 0;
        senddispls[proc] = 0;
        sendtypes[proc] = MPI_CHAR;
    }
    recvcounts[0] = localsizes[0]*localsizes[1];
    recvdispls[0] = 0;


    /* The originating process needs to allocate and fill the source array,
     * and then define types defining the array chunks to send, and 
     * fill out senddispls, sendcounts (1) and sendtypes.
     */
    if (rank == 0) {
        /* 4 types of blocks - 
         * blocksize*blocksize, blocksize+1*blocksize, blocksize*blocksize+1, blocksize+1*blocksize+1
         */
        MPI_Datatype blocktypes[4];
        int subsizes[2];
        int starts[2] = {0,0};
        for (int i=0; i<2; i++) {
           subsizes[0] = blocksize+i;
           for (int j=0; j<2; j++) {
               subsizes[1] = blocksize+j;
               MPI_Type_create_subarray(2, globalsizes, subsizes, starts, MPI_ORDER_C, MPI_CHAR, &blocktypes[2*i+j]);
               MPI_Type_commit(&blocktypes[2*i+j]);
           }
        }

        /* now figure out the displacement and type of each processor's data */
        for (int proc=0; proc<size; proc++) {
            int row, col;
            rowcol(proc, blocks, &row, &col);

            sendcounts[proc] = 1;
            senddispls[proc] = (row*blocksize*globalsizes[1] + col*blocksize)*sizeof(char);

            int idx = typeIdx(row, col, blocks);
            sendtypes[proc] = blocktypes[idx];
        }
    }

    MPI_Alltoallw(globalptr, sendcounts, senddispls, sendtypes,
                  &(localdata[0][0]), recvcounts, recvdispls, recvtypes, 
                  MPI_COMM_WORLD);
}


/* second  method: distribute almost all data using colums of size blocksize, 
 * then clean up the last row with another scatterv */

void twophasevecs(const int myrow, const int mycol, const int rank, const int size, 
                    const int blocks[2], const int blocksize, const int globalsizes[2], const int localsizes[2],
                    const char *const globalptr,  char **localdata) {
    int sendcounts[ size ];
    int senddispls[ size ];
    int recvcounts;

    for (int proc=0; proc<size; proc++) {
        sendcounts[proc] = 0;
        senddispls[proc] = 0;
    }

    /* We're going to be operating mostly in units of a single column of a "normal" sized block.
     * There will need to be two vectors describing these columns; one in the context of the
     * global array, and one in the local results.
     */
    MPI_Datatype vec, localvec;
    MPI_Type_vector(blocksize, 1, localsizes[1], MPI_CHAR, &localvec);
    MPI_Type_create_resized(localvec, 0, sizeof(char), &localvec);
    MPI_Type_commit(&localvec);

    MPI_Type_vector(blocksize, 1, globalsizes[1], MPI_CHAR, &vec);
    MPI_Type_create_resized(vec, 0, sizeof(char), &vec);
    MPI_Type_commit(&vec);

    /* The originating process needs to allocate and fill the source array,
     * and then define types defining the array chunks to send, and 
     * fill out senddispls, sendcounts (1) and sendtypes.
     */
    if (rank == 0) {
        /* create the vector type which will send one column of a "normal" sized-block */
        /* then all processors except those in the last row need to get blocksize*vec or (blocksize+1)*vec */
        /* will still have to do something to tidy up the last row of values */
        /* we need to make the type have extent of 1 char for scattering */
        for (int proc=0; proc<size; proc++) {
            int row, col;
            rowcol(proc, blocks, &row, &col);

            sendcounts[proc] = isLastCol(col, blocks) ? blocksize+1 : blocksize;
            senddispls[proc] = (row*blocksize*globalsizes[1] + col*blocksize);
        }
    }

    recvcounts = localsizes[1];
    MPI_Scatterv(globalptr, sendcounts, senddispls, vec,
                  &(localdata[0][0]), recvcounts, localvec, 0, MPI_COMM_WORLD);

    MPI_Type_free(&localvec);
    if (rank == 0)
        MPI_Type_free(&vec);

    /* now we need to do one more scatter, scattering just the last row of data 
     * just to the processors on the last row.
     * Here we recompute the sendcounts
     */
    if (rank == 0) {
        for (int proc=0; proc<size; proc++) {
            int row, col;
            rowcol(proc, blocks, &row, &col);
            sendcounts[proc] = 0;
            senddispls[proc] = 0;

            if ( isLastRow(row,blocks) ) {
                sendcounts[proc] = blocksize;
                senddispls[proc] = (globalsizes[0]-1)*globalsizes[1]+col*blocksize;
                if ( isLastCol(col,blocks) ) 
                    sendcounts[proc] += 1;
            }
        }
    }

    recvcounts = 0;
    if ( isLastRow(myrow, blocks) ) {
        recvcounts = blocksize;
        if ( isLastCol(mycol, blocks) )
            recvcounts++;
    }
    MPI_Scatterv(globalptr, sendcounts, senddispls, MPI_CHAR,
                  &(localdata[blocksize][0]), recvcounts, MPI_CHAR, 0, MPI_COMM_WORLD);
}
/* third method: first distribute rows, then columns, each with a single scatterv */

void twophaseRowCol(const int myrow, const int mycol, const int rank, const int size, 
                    const int blocks[2], const int blocksize, const int globalsizes[2], const int localsizes[2],
                    const char *const globalptr,  char **localdata) {
    char **rowdata ;

    /* create communicators which have processors with the same row or column in them*/
    MPI_Comm colComm, rowComm;
    MPI_Comm_split(MPI_COMM_WORLD, myrow, rank, &rowComm);
    MPI_Comm_split(MPI_COMM_WORLD, mycol, rank, &colComm);

    /* first, scatter the array by rows, with the processor in column 0 corresponding to each row
     * receiving the data */
    if (mycol == 0) {
        int sendcounts[ blocks[0] ];
        int senddispls[ blocks[0] ];
        senddispls[0] = 0;

        for (int row=0; row<blocks[0]; row++) {
            /* each processor gets blocksize rows, each of size globalsizes[1]... */
            sendcounts[row] = blocksize*globalsizes[1];
            if (row > 0) 
                senddispls[row] = senddispls[row-1] + sendcounts[row-1];
        }
        /* the last processor gets one more */
        sendcounts[blocks[0]-1] += globalsizes[1];

        /* allocate my rowdata */
        rowdata = allocchar2darray( sendcounts[myrow], globalsizes[1] );

        /* perform the scatter of rows */
        MPI_Scatterv(globalptr, sendcounts, senddispls, MPI_CHAR,
                      &(rowdata[0][0]), sendcounts[myrow], MPI_CHAR, 0, colComm);

    }

    /* Now, within each row of processors, we can scatter the columns.  
     * We can do this as we did in the previous example; create a vector
     * (and localvector) type and scatter accordingly */
    int locnrows = blocksize;
    if ( isLastRow(myrow, blocks) )
        locnrows++;

    MPI_Datatype vec, localvec;
    MPI_Type_vector(locnrows, 1, globalsizes[1], MPI_CHAR, &vec);
    MPI_Type_create_resized(vec, 0, sizeof(char), &vec);
    MPI_Type_commit(&vec);

    MPI_Type_vector(locnrows, 1, localsizes[1], MPI_CHAR, &localvec);
    MPI_Type_create_resized(localvec, 0, sizeof(char), &localvec);
    MPI_Type_commit(&localvec);

    int sendcounts[ blocks[1] ];
    int senddispls[ blocks[1] ];
    if (mycol == 0) {
        for (int col=0; col<blocks[1]; col++) {
            sendcounts[col] = isLastCol(col, blocks) ? blocksize+1 : blocksize;
            senddispls[col] = col*blocksize;
        }
    }
    char *rowptr = (mycol == 0) ? &(rowdata[0][0]) : NULL;

    MPI_Scatterv(rowptr, sendcounts, senddispls, vec,
                  &(localdata[0][0]), sendcounts[mycol], localvec, 0, rowComm);

    MPI_Type_free(&localvec);
    MPI_Type_free(&vec);

    if (mycol == 0) 
        freechar2darray(rowdata);

    MPI_Comm_free(&rowComm);
    MPI_Comm_free(&colComm);
}

int main(int argc, char **argv) {

    int rank, size;
    int blocks[2] = {0,0};
    const int blocksize=3;
    int globalsizes[2], localsizes[2];
    char **globaldata;
    char *globalptr = NULL;

    MPI_Init(&argc, &argv);
    MPI_Comm_rank(MPI_COMM_WORLD, &rank);
    MPI_Comm_size(MPI_COMM_WORLD, &size);

    if (rank == 0 && argc < 2) {
        fprintf(stderr,"Usage: %s method\n   Where method is one of: alltoall, twophasevecs, twophaserowcol\n", argv[0]);
        MPI_Abort(MPI_COMM_WORLD,1);
    }

    /* calculate sizes for a 2d grid of processors */
    MPI_Dims_create(size, 2, blocks);

    int myrow, mycol;
    rowcol(rank, blocks, &myrow, &mycol);

    /* create array sizes so that last block has 1 too many rows/cols */
    globalsizes[0] = blocks[0]*blocksize+1;  
    globalsizes[1] = blocks[1]*blocksize+1;
    if (rank == 0) {
        globaldata = allocchar2darray(globalsizes[0], globalsizes[1]);
        globalptr = &(globaldata[0][0]);
        for (int i=0; i<globalsizes[0]; i++) 
            for (int j=0; j<globalsizes[1]; j++)
                globaldata[i][j] = 'a'+(i*globalsizes[1] + j)%26;

        printf("Global array: \n");
        printarray(globaldata, globalsizes[0], globalsizes[1]);
    }

    /* the local chunk we'll be receiving */
    localsizes[0] = blocksize; localsizes[1] = blocksize;
    if ( isLastRow(myrow,blocks)) localsizes[0]++;
    if ( isLastCol(mycol,blocks)) localsizes[1]++;
    char **localdata = allocchar2darray(localsizes[0],localsizes[1]);

    if (!strcasecmp(argv[1], "alltoall")) {
        if (rank == 0) printf("Method - alltoall\n");
        alltoall(myrow, mycol, rank, size, blocks, blocksize, globalsizes, localsizes, globalptr, localdata);
    } else if (!strcasecmp(argv[1],"twophasevecs")) {
        if (rank == 0) printf("Method - two phase, vectors, then cleanup\n");
        twophasevecs(myrow, mycol, rank, size, blocks, blocksize, globalsizes, localsizes, globalptr, localdata);
    } else {
        if (rank == 0) printf("Method - two phase - row, cols\n");
        twophaseRowCol(myrow, mycol, rank, size, blocks, blocksize, globalsizes, localsizes, globalptr, localdata);
    }

    for (int proc=0; proc<size; proc++) {
        if (proc == rank) {
            printf("\nRank %d:\n", proc);
            printarray(localdata, localsizes[0], localsizes[1]);
        }
        MPI_Barrier(MPI_COMM_WORLD);            
    }

    freechar2darray(localdata);
    if (rank == 0) 
        freechar2darray(globaldata);

    MPI_Finalize();

    return 0;
}

char **allocchar2darray(int n, int m) {
    char **ptrs = malloc(n*sizeof(char *));
    ptrs[0] = malloc(n*m*sizeof(char));
    for (int i=0; i<n*m; i++)
        ptrs[0][i]='.';

    for (int i=1; i<n; i++) 
        ptrs[i] = ptrs[i-1] + m;

    return ptrs;
}

void freechar2darray(char **a) {
    free(a[0]);
    free(a);
}

void printarray(char **data, int n, int m) {
    for (int i=0; i<n; i++) {
        for (int j=0; j<m; j++) 
            putchar(data[i][j]);
        putchar('\n');
    }
}

void rowcol(int rank, const int blocks[2], int *row, int *col) {
    *row = rank/blocks[1];
    *col = rank % blocks[1];
}

int isLastRow(int row, const int blocks[2]) {
    return (row == blocks[0]-1);
}

int isLastCol(int col, const int blocks[2]) {
    return (col == blocks[1]-1);
}

int typeIdx(int row, int col, const int blocks[2]) {
    int lastrow = (row == blocks[0]-1);
    int lastcol = (col == blocks[1]-1);

    return lastrow*2 + lastcol;
}

关于c - 使用MPI的不同大小的散布矩阵 block ,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/29325513/

相关文章:

c++ - Valgrind 调试结果不显示发生错误的行数

计算 #include 和 #define 的数量

java - Spark RDD,如何生成长度为N的JavaRDD?

c++ - MPI 同步 vector 矩阵

macos - 如何将stdin从文件传输到Xcode 4+中的可执行文件?

c - C 中的指针忽略无符号特征?

c - 害怕。寻找。告诉。 "warning: value computed is not used "

linux - 在双核处理器上执行 2 个操作系统

algorithm - Python、Java、C# 和并行算法

c - MPI 将矩阵划分为 block