OpenMPI 计算机特定运行时错误

标签 openmpi

感谢您阅读我的帖子。我刚刚开始使用 openMPI。我在我的 mac (OSX 10.5.8) 和我的 linux (mint 14) 上安装了 openmpi 1.6.5。两台计算机都可以编译并运行非常简单的程序,例如 Hello World 或将整数从一个进程发送到另一个进程。但是,每当我尝试使用 MPI_Bcast() 或 MPI_send() 发送数组时,它都会引发段错误错误。

#include <iostream>
#include <stdlib.h>
#include <mpi.h>
using namespace std;

int main(int argc,char** argv)
{
    int np,nid;
    float *a;

    MPI_Init(&argc,&argv);
    MPI_Comm_size(MPI_COMM_WORLD,&np);
    MPI_Comm_rank(MPI_COMM_WORLD,&nid); 

    if (nid == 0)
    {
        a = (float*) calloc(9,sizeof(float));
        for (int i = 0; i < 9; i++)
        {
            a[i] = i;
        }
    }

    MPI_Bcast(a,9,MPI_FLOAT,0,MPI_COMM_WORLD);  

    MPI_Finalize();
    return 0;
}    

这是错误消息:

[rsove-M11BB:02854] *** Process received signal ***
[rsove-M11BB:02854] Signal: Segmentation fault (11)
[rsove-M11BB:02854] Signal code: Address not mapped (1)
[rsove-M11BB:02854] Failing at address: (nil)
[rsove-M11BB:02855] *** Process received signal ***
[rsove-M11BB:02855] Signal: Segmentation fault (11)
[rsove-M11BB:02855] Signal code: Address not mapped (1)
[rsove-M11BB:02855] Failing at address: (nil)
[rsove-M11BB:02854] [ 0] /lib/x86_64-linux-gnu/libc.so.6(+0x364a0) [0x7fddf08f64a0]
[rsove-M11BB:02854] [ 1] /lib/x86_64-linux-gnu/libc.so.6(+0x142953) [0x7fddf0a02953]
[rsove-M11BB:02854] [ 2] /usr/local/openmpi    /lib/libmpi.so.1(opal_convertor_unpack+0x105) [0x7fddf12a0b35]
[rsove-M11BB:02854] [ 3] /usr/local/openmpi/lib/openmpi    /mca_pml_ob1.so(mca_pml_ob1_recv_frag_callback_match+0x415) [0x7fddece38ee5]
[rsove-M11BB:02854] [ 4] /usr/local/openmpi/lib/openmpi/mca_btl_sm.so(mca_btl_sm_component_progress+0x23d) [0x7fddec61477d]
[rsove-M11BB:02854] [ 5] /usr/local/openmpi/lib/libmpi.so.1(opal_progress+0x5a)     [0x7fddf12ac2ea]
[rsove-M11BB:02854] [ 6] /usr/local/openmpi/lib/libmpi.so.1(ompi_request_default_wait+0x11d) [0x7fddf11fce2d]
[rsove-M11BB:02854] [ 7] /usr/local/openmpi/lib/openmpi/mca_coll_tuned.so(ompi_coll_tuned_bcast_intra_generic+0x4d6) [0x7fddeb73e346]
[rsove-M11BB:02854] [ 8] /usr/local/openmpi/lib/openmpi/mca_coll_tuned.so(ompi_coll_tuned_bcast_intra_binomial+0xcb) [0x7fddeb73e85b]
[rsove-M11BB:02854] [ 9] /usr/local/openmpi/lib/openmpi/mca_coll_tuned.so(ompi_coll_tuned_bcast_intra_dec_fixed+0xcc) [0x7fddeb735b5c]
[rsove-M11BB:02854] [10] /usr/local/openmpi/lib/openmpi/mca_coll_sync.so(mca_coll_sync_bcast+0x79) [0x7fddeb951799]
[rsove-M11BB:02854] [11] /usr/local/openmpi/lib/libmpi.so.1(MPI_Bcast+0x148) [0x7fddf12094d8]
[rsove-M11BB:02854] [12] Test(main+0xb4) [0x408f90]
[rsove-M11BB:02854] [13] /lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0xed) [0x7fddf08e176d]
[rsove-M11BB:02854] [14] Test() [0x408df9]
[rsove-M11BB:02854] *** End of error message ***
[rsove-M11BB:02855] [ 0] /lib/x86_64-linux-gnu/libc.so.6(+0x364a0) [0x7fa4c67be4a0]
[rsove-M11BB:02855] [ 1] /lib/x86_64-linux-gnu/libc.so.6(+0x142953) [0x7fa4c68ca953]
[rsove-M11BB:02855] [ 2] /usr/local/openmpi/lib/libmpi.so.1(opal_convertor_unpack+0x105) [0x7fa4c7168b35]
[rsove-M11BB:02855] [ 3] /usr/local/openmpi/lib/openmpi/mca_pml_ob1.so(mca_pml_ob1_recv_frag_callback_match+0x415) [0x7fa4c2d00ee5]
[rsove-M11BB:02855] [ 4] /usr/local/openmpi/lib/openmpi/mca_btl_sm.so(mca_btl_sm_component_progress+0x23d) [0x7fa4c24dc77d]
[rsove-M11BB:02855] [ 5] /usr/local/openmpi/lib/libmpi.so.1(opal_progress+0x5a) [0x7fa4c71742ea]
[rsove-M11BB:02855] [ 6] /usr/local/openmpi/lib/libmpi.so.1(ompi_request_default_wait+0x11d) [0x7fa4c70c4e2d]
[rsove-M11BB:02855] [ 7] /usr/local/openmpi/lib/openmpi/mca_coll_tuned.so(ompi_coll_tuned_bcast_intra_generic+0x59c) [0x7fa4c160640c]
[rsove-M11BB:02855] [ 8] /usr/local/openmpi/lib/openmpi/mca_coll_tuned.so(ompi_coll_tuned_bcast_intra_binomial+0xcb) [0x7fa4c160685b]
[rsove-M11BB:02855] [ 9] /usr/local/openmpi/lib/openmpi/mca_coll_tuned.so(ompi_coll_tuned_bcast_intra_dec_fixed+0xcc) [0x7fa4c15fdb5c]
[rsove-M11BB:02855] [10] /usr/local/openmpi/lib/openmpi/mca_coll_sync.so(mca_coll_sync_bcast+0x79) [0x7fa4c1819799]
[rsove-M11BB:02855] [11] /usr/local/openmpi/lib/libmpi.so.1(MPI_Bcast+0x148) [0x7fa4c70d14d8]
[rsove-M11BB:02855] [12] Test(main+0xb4) [0x408f90]
[rsove-M11BB:02855] [13] /lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0xed) [0x7fa4c67a976d]
[rsove-M11BB:02855] [14] Test() [0x408df9]
[rsove-M11BB:02855] *** End of error message ***
--------------------------------------------------------------------------
mpirun noticed that process rank 1 with PID 2854 on node rsove-M11BB exited on signal     11 (Segmentation fault).
--------------------------------------------------------------------------

奇怪的是,当我在 friend 的计算机上运行相同的代码时,它编译并运行没有问题。

预先感谢您的帮助。

最佳答案

你犯了一个非常典型的错误。 MPI_Bcast 操作要求将已分配的数组作为其根级和所有其他级别的第一个参数传递。因此必须修改代码,例如像这样:

// Allocate the array everywhere
a = (float*) calloc(9,sizeof(float));
// Initialise the array at rank 0 only
if (nid == 0)
{
    for (int i = 0; i < 9; i++)
    {
        a[i] = i;
    }
}

关于OpenMPI 计算机特定运行时错误,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/21343651/

相关文章:

c - MPI_Finalize() 不结束任何进程

c - 打开 MPI - mpirun 在简单程序上因错误而退出

c++ - 使用 OpenMPI 创建 HDF5 文件和数据集

c - 强制 MPI_Send 使用 eager 或 rendezvouz 协议(protocol)

process - 如何在 OpenMPI 中用 --map-by 替换 --cpus-per-proc

java - 错误 "failed to write core dump"是什么?

c++ - 如何禁用 MPI 的 C++ 包装器?

security - MPI 实现(OpenMPI、MPICH)如何处理安全/身份验证

python - mpi4py Reduce() 中可能的缓冲区大小限制

c++ - 如何避免与 OpenMPI 的名称冲突?