c - 使用 MPI_Send 和 MPI_Recv 传递数组时出错

标签 c mpi

我正在尝试通过 MPI_SendMPI_Recv 传递和接收 double 组,但它不起作用

#include "mpi.h"
#include <stdio.h>
#include <stdlib.h>
#include <math.h>
#include <sys/time.h>

#define N 5
#define ITERS 10
#define ARRAY_SIZE (N+2) * (N+2)
// N and ITERS might be input arguments

double **A;

void initialize (double **A)
{
  int i,j;

   for(i =0; i < N+2 ; i++){
     for(j =0; j < N+2 ; j++){
      if(i== 0 || j == 0 || i == (N+1) || j == (N +1) )
        A[i][j] = 0.0;
      else
        A[i][j] = rand() % 10 + 1;
     }
   }
}
void showArray(double **A){
  int i,j;
  printf("\n");
  for(i =0 ; i < N+2 ; i++){
    for(j =0; j < N+2 ; j++){
      printf("%f, ",A[i][j]);
    }
    printf("\n");
  }
}

void stencil(double **A){
  int i,j;
  printf("\n");
  for(i =1 ; i <= N ; i++){
    for(j =1; j <=N ; j++){
      A[i][j] = 0.3 *( A[i][j] + A[i-1][j] + A[i+1][j] + A[i][j-1] + A[i][j+1]);
    }
  }
}


int main(int argc, char * argv[]){

  int MyProc, size,tag=1;
  char msg='A', msg_recpt;
  MPI_Status status;
  double **received_array;

  //showArray(A);
  MPI_Init(&argc, &argv);
  MPI_Comm_rank(MPI_COMM_WORLD, &MyProc);
  MPI_Comm_size(MPI_COMM_WORLD, &size);

  printf("Process # %d started \n", MyProc);
  MPI_Barrier(MPI_COMM_WORLD);

  //allocating received_array
  received_array = malloc((N+2) * sizeof(double *));
  int i;
  for (i=0; i<N+2; i++) {
    received_array[i] = malloc((N+2) * sizeof(double));
  }

  if(MyProc == 0){
    A = malloc((N+2) * sizeof(double *));
    int i;
    for (i=0; i<N+2; i++) {
      A[i] = malloc((N+2) * sizeof(double));
    }
    initialize(A);
    stencil(A);
    showArray(A);
    //printf("sizeof: %d\n",sizeof(A)/sizeof(double));

    MPI_Send(A, ARRAY_SIZE, MPI_DOUBLE, MyProc +1,tag, MPI_COMM_WORLD);
    printf("Proc #%d enviando a #%d\n",MyProc,MyProc+1 );
  }

  if(MyProc > 0 && MyProc < size -1){
    MPI_Recv(received_array, ARRAY_SIZE, MPI_DOUBLE, MyProc- 1, tag, MPI_COMM_WORLD, &status);

    printf("Proc #%d recibe de Proc #%d\n",MyProc,MyProc- 1 );
    //stencil(A);
    printf("Proc #%d enviando a #%d\n",MyProc,MyProc+1 );
    MPI_Send(received_array, ARRAY_SIZE, MPI_DOUBLE, MyProc +1,tag, MPI_COMM_WORLD);
  }

  if(MyProc == size -1 ){
    MPI_Recv(received_array, ARRAY_SIZE, MPI_DOUBLE, MyProc- 1, tag, MPI_COMM_WORLD, &status);
    printf("Proc #%d recibe de Proc #%d\n",MyProc,MyProc- 1 );
    //stencil(A);
  }

  printf("Finishing proc %d\n", MyProc);
  MPI_Barrier(MPI_COMM_WORLD);
  MPI_Finalize();

}

我收到此错误

[compute-0-4.local:30784] *** An error occurred in MPI_Recv
[compute-0-4.local:30784] *** on communicator MPI_COMM_WORLD
[compute-0-4.local:30784] *** MPI_ERR_BUFFER: invalid buffer pointer
[compute-0-4.local:30784] *** MPI_ERRORS_ARE_FATAL: your MPI job will now abort
[compute-0-4.local][[28950,1],0][btl_tcp_frag.c:215:mca_btl_tcp_frag_recv] mca_btl_tcp_frag_recv: readv failed: Connection reset by peer (104)
--------------------------------------------------------------------------
mpirun has exited due to process rank 1 with PID 30784 on
node compute-0-4.local exiting improperly. There are two reasons this could occur:

1. this process did not call "init" before exiting, but others in
the job did. This can cause a job to hang indefinitely while it waits
for all processes to call "init". By rule, if one process calls "init",
then ALL processes must call "init" prior to termination.

2. this process called "init", but exited without calling "finalize".
By rule, all processes that call "init" MUST call "finalize" prior to
exiting or it will be considered an "abnormal termination"

This may have caused other processes in the application to be
terminated by signals sent by mpirun (as reported here).
--------------------------------------------------------------------------
[compute-0-4.local:30782] 1 more process has sent help message help-mpi-errors.txt / mpi_errors_are_fatal
[compute-0-4.local:30782] Set MCA parameter "orte_base_help_aggregate" to 0 to see all help / error messages

现在为 received_array 分配内存,我收到此错误消息:

[compute-0-0:18176] *** Process received signal ***
[compute-0-0:18177] *** Process received signal ***
[compute-0-0:18177] Signal: Segmentation fault (11)
[compute-0-0:18177] Signal code:  (128)
[compute-0-0:18177] Failing at address: (nil)
[compute-0-0:18176] Signal: Segmentation fault (11)
[compute-0-0:18176] Signal code: Address not mapped (1)
[compute-0-0:18176] Failing at address: 0x10
[compute-0-0:18176] [ 0] /lib64/libpthread.so.0() [0x326fa0f500]
[compute-0-0:18176] [ 1] /opt/openmpi/lib/libmpi.so.1(opal_memory_ptmalloc2_int_malloc+0xae) [0x2b22bf88211e]
[compute-0-0:18176] [ 2] /opt/openmpi/lib/libmpi.so.1(opal_memory_ptmalloc2_malloc+0x57) [0x2b22bf883b87]
[compute-0-0:18176] [ 3] /opt/openmpi/lib/libmpi.so.1(+0x2258f7) [0x2b22bf88b8f7]
[compute-0-0:18176] [ 4] /opt/openmpi/lib/libmpi.so.1(mca_base_param_reg_int_name+0x3f) [0x2b22bf88bd9f]
[compute-0-0:18176] [ 5] /opt/openmpi/lib/libmpi.so.1(ompi_mpi_finalize+0x126) [0x2b22bf6f5fb6]
[compute-0-0:18176] [ 6] ./ej7(main+0x2d2) [0x4010e8]
[compute-0-0:18176] [ 7] /lib64/libc.so.6(__libc_start_main+0xfd) [0x326f21ecdd]
[compute-0-0:18176] [ 8] ./ej7() [0x400ac9]
[compute-0-0:18176] *** End of error message ***
[compute-0-0:18177] [ 0] /lib64/libpthread.so.0() [0x326fa0f500]
[compute-0-0:18177] [ 1] /opt/openmpi/lib/libmpi.so.1(opal_memory_ptmalloc2_int_malloc+0xae) [0x2b52f96ff11e]
[compute-0-0:18177] [ 2] /opt/openmpi/lib/libmpi.so.1(opal_memory_ptmalloc2_malloc+0x57) [0x2b52f9700b87]
[compute-0-0:18177] [ 3] /opt/openmpi/lib/libmpi.so.1(+0x2258f7) [0x2b52f97088f7]
[compute-0-0:18177] [ 4] /opt/openmpi/lib/libmpi.so.1(mca_base_param_reg_int_name+0x3f) [0x2b52f9708d9f]
[compute-0-0:18177] [ 5] /opt/openmpi/lib/libmpi.so.1(ompi_mpi_finalize+0x126) [0x2b52f9572fb6]
[compute-0-0:18177] [ 6] ./ej7(main+0x2d2) [0x4010e8]
[compute-0-0:18177] [ 7] /lib64/libc.so.6(__libc_start_main+0xfd) [0x326f21ecdd]
[compute-0-0:18177] [ 8] ./ej7() [0x400ac9]
[compute-0-0:18177] *** End of error message ***
--------------------------------------------------------------------------
mpirun noticed that process rank 1 with PID 18176 on node compute-0-0.local exited on signal 11 (Segmentation fault).

最佳答案

以与分配 A 类似的方式分配 received_array

即使您传递数组,MPI 也不分配内存。

那么编辑后的问题是,您正在传输一个方阵,该方阵被分配为一个 MPI 发送的指针,而不是使用 N+2 调用,每行一个。这是行不通的,因为 MPI_Send/MPI_Recv 所做的是发送 ARRAY_SIZE 连续元素...

在 HPC 中,我们直接使用 ARRAY_SIZE 的一维数组,然后使用宏(例如)来获取 2D 访问,因为它速度快、缓存友好,并且不需要 N+2 次调用(这对延迟不利),而不是一次。

关于c - 使用 MPI_Send 和 MPI_Recv 传递数组时出错,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/53363864/

相关文章:

c - 套接字编程读写与客户端服务器

c - 为什么 C 编译器很难获得 -Wwrite-strings 警告?

c - 声明数组时出现段错误

multithreading - 消息传递任意对象图?

c - 在C中按字母顺序排序单词

c# - 从 C# 调用非托管代码

c - 目录遍历c

c - 将子矩阵从主 MPI 传递到从属 MPI

c - 与 off_t 等效的 MPI 数据类型是什么?

azure - 如何修复 MPI_ERR_RMA_SHARED?