c - MPI 矩阵乘法，进程未清理

我正在尝试使用 MPI 将两个 nxn 矩阵相乘。第二个矩阵 (bb) 被广播给所有“奴隶”，然后从第一个矩阵 (aa) 发送一行来计算乘积。然后它将答案发送回主进程并存储在产品矩阵 cc 中。出于某种原因，我收到错误:

=   BAD TERMINATION OF ONE OF YOUR APPLICATION PROCESSES
=   EXIT CODE: 11
=   CLEANING UP REMAINING PROCESSES
=   YOU CAN IGNORE THE BELOW CLEANUP MESSAGES

我相信主进程正在接收从属进程发送的所有消息，反之亦然，所以我不确定这里发生了什么……有什么想法吗？

主要内容:

#include "mpi.h"
#include <stdio.h>
#include <stdlib.h>
#include <time.h>
#include <sys/times.h>
#define min(x, y) ((x)<(y)?(x):(y))
#define MASTER 0

double* gen_matrix(int n, int m);
int mmult(double *c, double *a, int aRows, int aCols, double *b, int bRows, int bCols);

int main(int argc, char* argv[]) {
    int nrows, ncols;
    double *aa;     /* the A matrix */
    double *bb;     /* the B matrix */
    double *cc1;    /* A x B computed */
    double *buffer; /* Row to send to slave for processing */
    double *ans;    /* Computed answer for master */
    int myid, numprocs;
    int i, j, numsent, sender;
    int row, anstype;
    double starttime, endtime;
    MPI_Status status;

    MPI_Init(&argc, &argv);
    MPI_Comm_size(MPI_COMM_WORLD, &numprocs);
    MPI_Comm_rank(MPI_COMM_WORLD, &myid);
    if (argc > 1) {
        nrows = atoi(argv[1]);
        ncols = nrows;
        if (myid == 0) {
            /* Master Code */
            aa = gen_matrix(nrows, ncols);
            bb = gen_matrix(ncols, nrows);
            cc1 = malloc(sizeof(double) * nrows * nrows);
            starttime = MPI_Wtime();
            buffer = (double*)malloc(sizeof(double) * ncols);
            numsent = 0;
            MPI_Bcast(bb, ncols*nrows, MPI_DOUBLE, MASTER, MPI_COMM_WORLD); /*broadcast bb to all slaves*/
            for (i = 0; i < min(numprocs-1, nrows); i++) {                  /*for each process or row*/
                for (j = 0; j < ncols; j++) {                               /*for each column*/
                    buffer[j] = aa[i * ncols + j];                          /*get row of aa*/
                }
                MPI_Send(buffer, ncols, MPI_DOUBLE, i+1, i+1, MPI_COMM_WORLD); /*send row to slave*/
                numsent++;                                                     /*increment number of rows sent*/
            }
            ans = (double*)malloc(sizeof(double) * ncols);
            for (i = 0; i < nrows; i++) {
                MPI_Recv(ans, ncols, MPI_DOUBLE, MPI_ANY_SOURCE, MPI_ANY_TAG,
                         MPI_COMM_WORLD, &status);
                sender = status.MPI_SOURCE;
                anstype = status.MPI_TAG;

                for (i = 0; i < ncols; i++){
                    cc1[(anstype-1) * ncols + i] = ans[i];
                }

                if (numsent < nrows) {
                    for (j = 0; j < ncols; j++) {
                        buffer[j] = aa[numsent*ncols + j];
                    }
                    MPI_Send(buffer, ncols, MPI_DOUBLE, sender, numsent+1,
                             MPI_COMM_WORLD);
                    numsent++;
                } else {
                    MPI_Send(MPI_BOTTOM, 0, MPI_DOUBLE, sender, 0, MPI_COMM_WORLD);
                }
            }

            endtime = MPI_Wtime();
            printf("%f\n",(endtime - starttime));
        } else {
            /* Slave Code */
            buffer = (double*)malloc(sizeof(double) * ncols);
            bb = (double*)malloc(sizeof(double) * ncols*nrows);
            MPI_Bcast(bb, ncols*nrows, MPI_DOUBLE, MASTER, MPI_COMM_WORLD); /*get bb*/
            if (myid <= nrows) {
                while(1) {
                    MPI_Recv(buffer, ncols, MPI_DOUBLE, MASTER, MPI_ANY_TAG, MPI_COMM_WORLD, &status); /*recieve a row of aa*/
                    if (status.MPI_TAG == 0){
                        break;
                    }

                    row = status.MPI_TAG; /*get row number*/
                    ans = (double*)malloc(sizeof(double) * ncols);
                    for (i = 0; i < ncols; i++){
                        ans[i]=0.0;
                    }
                    for (i=0; i<nrows; i++){
                        for (j = 0; j < ncols; j++) { /*for each column*/
                            ans[i] += buffer[j] * bb[j * ncols + i];
                        }
                    }
                    MPI_Send(ans, ncols, MPI_DOUBLE, MASTER, row, MPI_COMM_WORLD);
                }
            }
        } /*end slave code*/
    } else {
        fprintf(stderr, "Usage matrix_times_vector <size>\n");
    }
    MPI_Finalize();
    return 0;
}

最佳答案

此错误消息通常意味着至少有一个 MPI 进程崩溃，并且整个 MPI 作业随后中止。它可能由任何类型的错误引起，但大多数情况下，它是由错误的内存访问引起的段错误。

我没有仔细查看代码，所以我不知道逻辑是否有效等，但我可以说的是这一行有问题:

MPI_Recv(&ans, nrows, MPI_DOUBLE, MPI_ANY_SOURCE, MPI_ANY_TAG,
         MPI_COMM_WORLD, &status);

确实，这里有两个问题:

&ans 是一个**double，这不是你想要的，我猜你想要的是ans
ans 还没有分配所以不能作为接收缓冲区

首先尝试解决这个问题，看看会发生什么。

编辑:在您的新代码上，您分配 ans 如下:

ans = (double*)malloc(sizeof(double) * ncols);

然后你像这样初始化它:

for (i = 0; i < nrows; i++) {
    ans[i]=0.0;
}

然后像这样使用它:

MPI_Send(ans, nrows, MPI_DOUBLE, MASTER, row, MPI_COMM_WORLD);

或

MPI_Recv(ans, nrows, MPI_DOUBLE, MPI_ANY_SOURCE, MPI_ANY_TAG,
         MPI_COMM_WORLD, &status);

这不一致:ans 的大小是ncols 还是nrows？

你的新错误信息是什么？

关于c - MPI 矩阵乘法，进程未清理，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/32730995/

c - MPI 矩阵乘法，进程未清理

上一篇：c - 广度优先搜索突然崩溃

下一篇：c - Project Tango - 服务坐标系开始，C API