linux - openmpi python 超过3个进程

标签 linux networking ssh python

我运行了一个包含 3 个以上进程的简单 mpi python 程序。 例如:

mpiexec -host master,w1,w2,w3 python code.py

显示错误

ssh: Could not resolve hostname w3: Name or service not known
ORTE was unable to reliably start one or more daemons.

This usually is caused by:

* not finding the required libraries and/or binaries on
  one or more nodes. Please check your PATH and LD_LIBRARY_PATH
  settings, or configure OMPI with --enable-orterun-prefix-by-default

* lack of authority to execute on one or more specified nodes.
  Please verify your allocation and authorities.

* the inability to write startup files into /tmp (--tmpdir/orte_tmpdir_base).
  Please check with your sys admin to determine the correct location to use.

*  compilation of the orted with dynamic libraries when static are required
  (e.g., on Cray). Please check your configure cmd line and consider using
  one of the contrib/platform definitions for your system type.

* an inability to create a connection back to mpirun due to a
  lack of common network interfaces and/or no route found between
  them. Please check network connectivity (including firewalls
  and network routing requirements).

但是,如果我使用 w1、w2、w3 中的任意两个运行该程序,它就可以运行。 例如:

mpiexec -host master,w1,w3 python code.py

这是代码

 import random
 import numpy as np
 from mpi4py import MPI

 comm = MPI.COMM_WORLD
 rank = comm.rank
 size = comm.size

 if rank ==0:
 print rank, 'worker'
 else:
 print rank, 'worker'

我该如何解决?谢谢。

最佳答案

根据输出:ssh: Could not resolve hostname w3: Name or service not known,问题很明显:

主机名 w3 无法被您的主节点(机器)识别。 您可以将 name-ip 映射添加到 /etc/hosts,格式为 ip name。 例如: 255.255.255.0 the_name

关于linux - openmpi python 超过3个进程,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/42989574/

相关文章:

python - 将 Java 线程固定到内核

c - Linux 中无效的信号处理库

python - 新安装的 Fenics 演示在 MPICH_NUMVERSION 上崩溃

mysql - 通过 SSH 从 BASH 编写 MySQL 脚本

python - 为什么在 IAM 设置为允许所有时 AWS 拒绝许可?

java - 用 Java 将多台计算机连接到一个大脑

networking - TCP连接建立中的两军问题

linux - 有没有办法在 Linux 上检查 3G 速度连接?

unix - 使用 ssh 进行远程 dd 备份

推/pull 后 Git 静默失败