python - 为什么 mpirun 在循环中卡住

标签 python linux mpi

这是我的脚本和 python 代码。

$猫走

while true
do
echo "------->"
python3 -m mpi4py ./go.py
echo "<------"
done

此代码循环运行 python go.py。

$猫go.py

import mpi4py.MPI as MPI

print( "######", MPI.Is_initialized())

comm = MPI.COMM_WORLD
comm_rank = comm.Get_rank()
comm_size = comm.Get_size()

# point to point communication
data_send = [comm_rank]*5
comm.send(data_send,dest=(comm_rank+1)%comm_size)
data_recv =comm.recv(source=(comm_rank-1)%comm_size)
print("my rank is %d, and Ireceived:" % comm_rank)
print( data_recv )

MPI.Finalize()

print( "######", MPI.Is_finalized())

这段 python 代码只是打印。

在我运行这个 go 脚本后,go.py 执行并退出,当 go.py 再次执行时, 它卡住了。

$ mpirun --mca orte_base_help_aggregate 0 -np 2 sh ./go

------->
------->
--------------------------------------------------------------------------
[[27909,1],1]: A high-performance Open MPI point-to-point messaging module
was unable to find any relevant network interfaces:

Module: OpenFabrics (openib)
  Host: myvm20

Another transport will be used instead, although this may result in
lower performance.
--------------------------------------------------------------------------
--------------------------------------------------------------------------
[[27909,1],0]: A high-performance Open MPI point-to-point messaging module
was unable to find any relevant network interfaces:

Module: OpenFabrics (openib)
  Host: myvm20

Another transport will be used instead, although this may result in
lower performance.
--------------------------------------------------------------------------
###### True
###### True
my rank is 0, and Ireceived:
[1, 1, 1, 1, 1]
my rank is 1, and Ireceived:
[0, 0, 0, 0, 0]
###### True
###### True
<------
------->
<------
------->
--------------------------------------------------------------------------
[[27909,1],0]: A high-performance Open MPI point-to-point messaging module
was unable to find any relevant network interfaces:

Module: OpenFabrics (openib)
  Host: myvm20

Another transport will be used instead, although this may result in
lower performance.
--------------------------------------------------------------------------
--------------------------------------------------------------------------
[[27909,1],1]: A high-performance Open MPI point-to-point messaging module
was unable to find any relevant network interfaces:

Module: OpenFabrics (openib)
  Host: myvm20

Another transport will be used instead, although this may result in
lower performance.
--------------------------------------------------------------------------

并永远卡住。

为什么会卡住,我该如何继续这个脚本?

顺便说一句: 我有两种工作 A/B 要运行,工作 A 坚持,工作 B 完成并退出。所以我不能按如下方式运行它:

while true
do
  echo "------->"
  mpirun -np 2 A : -np 2 B
  echo "<------"
done

还有其他方法吗?

最佳答案

长话短说,你不能那样做。

这是你应该做的

while true
do
  echo "------->"
  mpirun --mca orte_base_help_aggregate 0 -np 2 python3 -m mpi4py ./go.py
  echo "<------"
done

关于python - 为什么 mpirun 在循环中卡住,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/47320846/

相关文章:

python - 如何查明电子邮件地址是否存在?

linux - 如何将文件(从 aws s3 存储桶复制)附加到 Linux 中的另一个文件,

c++ - MPI - 异步广播/收集

c - 使用 MPI 终止所有进程

python - 如何根据lxml中的 child 选择 parent ?

Python读取目录中的文件

python - Python 中随机文本 w/r 的更快解决方案

android - 无法理解linux命令的输出

linux - python3 为什么 print ("\r"+"text") 在 Linux 和 Windows 终端上不同

c - MPI 将值传递给所有内核