mpi + infiniband 连接过多

标签 mpi infiniband

我正在集群上运行 MPI 应用程序,使用 4 个节点,每个节点有 64 个核心。 该应用程序执行所有对所有的通信模式。

通过以下方式执行应用程序运行良好:

$: mpirun -npernode 36 ./Application

为每个节点添加更多进程会导致应用程序崩溃:

$: mpirun -npernode 37 ./Application

--------------------------------------------------------------------------
A process failed to create a queue pair. This usually means either
the device has run out of queue pairs (too many connections) or
there are insufficient resources available to allocate a queue pair
(out of memory). The latter can happen if either 1) insufficient
memory is available, or 2) no more physical memory can be registered
with the device.

For more information on memory registration see the Open MPI FAQs at:
http://www.open-mpi.org/faq/?category=openfabrics#ib-locked-pages

Local host:             laser045
Local device:           qib0
Queue pair type:        Reliable connected (RC)
--------------------------------------------------------------------------
[laser045:15359] *** An error occurred in MPI_Issend
[laser045:15359] *** on communicator MPI_COMM_WORLD
[laser045:15359] *** MPI_ERR_OTHER: known error not in list
[laser045:15359] *** MPI_ERRORS_ARE_FATAL: your MPI job will now abort
[laser040:49950] [[53382,0],0]->[[53382,1],30] mca_oob_tcp_msg_send_handler: writev failed: Connection reset by peer (104) [sd = 163]
[laser040:49950] [[53382,0],0]->[[53382,1],21] mca_oob_tcp_msg_send_handler: writev failed: Connection reset by peer (104) [sd = 154]
--------------------------------------------------------------------------
mpirun has exited due to process rank 128 with PID 15358 on
node laser045 exiting improperly. There are two reasons this could occur:

1. this process did not call "init" before exiting, but others in
the job did. This can cause a job to hang indefinitely while it waits
for all processes to call "init". By rule, if one process calls "init",
then ALL processes must call "init" prior to termination.

2. this process called "init", but exited without calling "finalize".
By rule, all processes that call "init" MUST call "finalize" prior to
exiting or it will be considered an "abnormal termination"

This may have caused other processes in the application to be
terminated by signals sent by mpirun (as reported here).
--------------------------------------------------------------------------
[laser040:49950] 4 more processes have sent help message help-mpi-btl-openib-cpc-base.txt / ibv_create_qp failed
[laser040:49950] Set MCA parameter "orte_base_help_aggregate" to 0 to see all help / error messages
[laser040:49950] 4 more processes have sent help message help-mpi-errors.txt / mpi_errors_are_fatal

编辑添加了所有通信模式的一些源代码:

// Send data to all other ranks
for(unsigned i = 0; i < (unsigned)size; ++i){
    if((unsigned)rank == i){
        continue;
    }

    MPI_Request request;
    MPI_Issend(&data, dataSize, MPI_DOUBLE, i, 0, MPI_COMM_WORLD, &request);
    requests.push_back(request);
}

// Recv data from all other ranks
for(unsigned i = 0; i < (unsigned)size; ++i){
    if((unsigned)rank == i){
       continue;
    }

    MPI_Status status;
    MPI_Recv(&recvData, recvDataSize, MPI_DOUBLE, i, 0, MPI_COMM_WORLD, &status);
}

// Finish communication operations
for(MPI_Request &r: requests){
    MPI_Status status;
    MPI_Wait(&r, &status);
}

作为集群用户,我可以做些什么,或者可以向集群管理员提供一些建议吗?

最佳答案

行 mca_oob_tcp_msg_send_handler 错误行可能表示与接收等级对应的节点已死亡(内存不足或收到 SIGSEGV):

http://www.open-mpi.org/faq/?category=tcp#tcp-connection-errors

Open-MPI 中的 OOB(带外)框架用于控制消息,而不是应用程序的消息。事实上,消息通常通过字节传输层 (BTL),例如 self、sm、vader、openib (Infiniband) 等。

“ompi_info -a”的输出在这方面很有用。

最后,问题中没有指定Infiniband硬件供应商是Mellanox,因此XRC选项可能不起作用(例如Intel/QLogic Infiniband不支持此选项)。

关于mpi + infiniband 连接过多,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/26576329/

相关文章:

cuda - GPU 和远程主机之间的 RDMA

infiniband - InfiniBand 中的门铃是什么?

c - 当操作系统取消固定远程内存缓冲区时,RDMA WRITE/READ 的行为如何?

azure - 如何: Azure OpenMPI with Infiniband - Linux

c++ - Microsft MPI MPI_Isend 中的死锁

infiniband - 使用 Infiniband 将远程内存映射到主机的地址空间

c - MPI并行编程

c++ - Boost.MPI recv 到现有 vector 的切片中

c - mpi 无法检测到我的 cpu 内核 intel i3 m350

c++ - 我如何在 Xcode 中使用 mpi.h?