c - OpenMPI + Fortran + C 的基本测试根据奇怪的事情抛出不同的错误

标签 c fortran mpi openmpi

在尝试将 OpenMPI 与 Fortran 和 C 一起使用时,我遇到了奇怪的问题。这是一个调用 C 函数的 Fortran 程序,并且两者都使用 OpenMPI。我已经设法将错误跟踪到这个非常简单的测试用例:

文件mpi_hello_world.F90:

program mpi_hello_world
  implicit none
  include 'mpif.h'
  integer :: ierror
  call MPI_Init(ierror)
  ! ERROR CHANGES IF I COMMENT THE FOLLOWING LINE
  write(*,*) 'before c_function: MPI_COMM_WORLD=',MPI_COMM_WORLD
  call c_function(MPI_COMM_WORLD)
  call MPI_Finalize()
end program mpi_hello_world

文件c_function.c:

#include "mpi.h"
#include <stdio.h>
void c_function_(MPI_Comm *comm) {
    printf("MPI_Comm comm=%d\n",*comm);
    int world_rank;
    MPI_Comm_rank(commi, &world_rank);
}

程序的输出是:

before c_function: MPI_COMM_WORLD=           0
MPI_Comm comm=0

看来变量传递正确。之后,我可能会收到两个运行时错误,具体取决于我是否注释了代码中指示的行。如果如图所示(未注释),那么我会遇到段错误:

Program received signal SIGSEGV: Segmentation fault - invalid memory reference.

Backtrace for this error:
#0  0x2B5330A9A777
#1  0x2B5330A9AD7E
#2  0x2B5331607D3F
#3  0x2B5331350D26
#4  0x4015D2 in c_function_
#5  0x401550 in MAIN__ at mpi_hello_world.F90:?
--------------------------------------------------------------------------
mpirun noticed that process rank 0 with PID 6088 on node pine exited on signal 11 (Segmentation fault).
--------------------------------------------------------------------------
make: *** [run] Error 139

如果我评论该行,我会从 OpenMPI 收到错误:

[pine:6328] *** An error occurred in MPI_Comm_rank
[pine:6328] *** reported by process [46992071589889,46991237185536]
[pine:6328] *** on communicator MPI_COMM_WORLD
[pine:6328] *** MPI_ERR_COMM: invalid communicator
[pine:6328] *** MPI_ERRORS_ARE_FATAL (processes in this communicator will now abort,
[pine:6328] ***    and potentially your MPI job)
-------------------------------------------------------
Primary job  terminated normally, but 1 process returned
a non-zero exit code.. Per user-direction, the job has been aborted.
-------------------------------------------------------
--------------------------------------------------------------------------
mpirun detected that one or more processes exited with non-zero status, thus causing
the job to be terminated. The first process to do so was:

   Process name: [[12732,1],0]
   Exit code:    5
--------------------------------------------------------------------------

我的想法是库链接有问题,但我不知道是什么。如果我能提供有关如何调试此问题的提示,那就太好了。

更多信息:我正在使用 OpenMPI 1.8.4 来编译 Fortran 和 C 文件。我还使用正确的 mpirun 运行,如 /path/to/openmpi/1.8.4/common/bin/mpirun -n 1 test

为了确保链接正确的库,我做了:

[$]: ldd hello 
linux-vdso.so.1 =>  (0x00007ffee39d6000)
libgfortran.so.3 => /usr/lib/x86_64-linux-gnu/libgfortran.so.3 (0x0007f6a4dca5000)
libm.so.6 => /lib/x86_64-linux-gnu/libm.so.6 (0x00007f6a4d99f000)
libmpi_mpifh.so.2 => /usr/lib/openmpi/1.8.4/gcc/lib/libmpi_mpifh.so.2 (0x00007f6a4d74a000)
libmpi.so.1 => /usr/lib/openmpi/1.8.4/gcc/lib/libmpi.so.1 (0x00007f6a4d46e000)
libc.so.6 => /lib/x86_64-linux-gnu/libc.so.6 (0x00007f6a4d0a9000)
libquadmath.so.0 => /usr/lib/x86_64-linux-gnu/libquadmath.so.0 (0x00007f6a4ce6c000)
libgcc_s.so.1 => /lib/x86_64-linux-gnu/libgcc_s.so.1 (0x00007f6a4cc56000)
libpthread.so.0 => /lib/x86_64-linux-gnu/libpthread.so.0 (0x00007f6a4ca38000)
libopen-rte.so.7 => /usr/lib/openmpi/1.8.4/gcc/lib/libopen-rte.so.7 (0x00007f6a4c7bb000)
libopen-pal.so.6 => /usr/lib/openmpi/1.8.4/gcc/lib/libopen-pal.so.6 (0x00007f6a4c4cf000)
/lib64/ld-linux-x86-64.so.2 (0x000055dacdee9000)
libnuma.so.1 => /usr/lib/x86_64-linux-gnu/libnuma.so.1 (0x00007f6a4c2c3000)
libpciaccess.so.0 => /usr/lib/x86_64-linux-gnu/libpciaccess.so.0 (0x00007f6a4c0ba000)
libcudart.so.6.0 => /usr/lib/x86_64-linux-gnu/libcudart.so.6.0 (0x00007f6a4be69000)
libdl.so.2 => /lib/x86_64-linux-gnu/libdl.so.2 (0x00007f6a4bc64000)
librt.so.1 => /lib/x86_64-linux-gnu/librt.so.1 (0x00007f6a4ba5c000)
libutil.so.1 => /lib/x86_64-linux-gnu/libutil.so.1 (0x00007f6a4b859000)
libz.so.1 => /lib/x86_64-linux-gnu/libz.so.1 (0x00007f6a4b63f000)
libstdc++.so.6 => /usr/lib/x86_64-linux-gnu/libstdc++.so.6 (0x00007f6a4b33b000)

有什么想法吗?大家有遇到类似问题吗?

最佳答案

MPI_Comm的定义在mpi.h中,但在不同的MPI实现中它是不同的,有的使用指针,有的使用int。

您必须使用第一条评论中提到的转换例程@Giles。还要注意它们可能只是宏而不是函数。到目前为止,将 Fortran 整数传递到 C 并在那里进行转换要容易得多(请注意 MPI_Fint 而不是 int)。

void c_function_(MPI_Fint *fcomm) {
    int world_rank;
    MPI_Comm_rank(MPI_Comm_f2c(fcomm), &world_rank);
}

如果需要从Fortran调用转换,那就更复杂了。主要的复杂性是它可能是一个宏。 我个人使用这个(https://github.com/LadaF/PoisFFT/blob/master/src/f_mpi_comm_c2f.c):

#include <mpi.h>

// This function is callable from Fortran. MPI_Comm_c2f itself may be just a macro.
MPI_Fint f_MPI_Comm_c2f(MPI_Comm *comm) {
  return MPI_Comm_c2f(*comm);
}

以及 Fortran 语言

  interface
    ! Intentionally returning integer and not integer(c_int).
    ! `c_handle` is a pointer to a C comm, not a C comm itself!
    ! We cannot be sure what Fortran type a C comm is!
    integer function MPI_Comm_c2f(c_handle) bind(C, name="f_MPI_Comm_c2f")
      use iso_c_binding
      type(c_ptr), value :: c_handle
    end function
  end interface

关于c - OpenMPI + Fortran + C 的基本测试根据奇怪的事情抛出不同的错误,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/34896420/

相关文章:

c - 工作队列和中断失败

java - Java 应用程序输入的良好输出格式是什么?

C - 结构内的指针不影响代码

iphone - iPhone 中的 Fortran 编辑器

c - MPI 2x 打印

c - C的未知模式提取

python - 可分配数组的 f2py 错误

c++ - MPI,通过其中一个过程产生一个 child

c++ - 如何将 vector 拆分成子集?

fortran - Fortran 中的字符串 : Portable LEN_TRIM and LNBLNK?