python - 如果无限 wait() 已经启动,信号处理程序会卡在 Popen.wait(timeout) 中

标签 python subprocess wait

我遇到了一个 Python 子进程问题,我在 Python 3.6 和 3.7 上复制了该问题,但我不明白。我有一个程序,称之为 Main,它使用 subprocess.Popen() 启动一个外部进程,称之为“Slave”。主程序注册一个 SIGTERM 信号处理程序。主进程使用 proc.wait(None) 或 proc.wait(timeout) 等待从进程完成。 Slave 进程可以通过向 Main 发送 SIGTERM 信号来中断。 sigterm 处理程序将向从站发送 SIGINT 信号并等待(30)它终止。如果 Main 使用 wait(None),则 sigterm 处理程序的 wait(30) 将等待整整 30 秒,即使从属进程已终止。如果 Main 使用 wait(timeout) 版本,则一旦 Slave 终止,sigterm 处理程序的 wait(30) 将返回。

这是一个演示该问题的小型测试应用程序。通过 python wait_test.py 运行它使用非超时等待(无)。通过 python wait_test.py <timeout value> 运行它为 Main 等待提供特定的超时。

程序运行后,执行kill -15 <pid>并查看应用程序如何 react 。

#
# Save this to a file called wait_test.py
#
import signal
import subprocess
import sys
from datetime import datetime

slave_proc = None


def sigterm_handler(signum, stack):
    print("Process received SIGTERM signal {} while processing job!".format(signum))
    print("slave_proc is {}".format(slave_proc))

    if slave_proc is not None:
        try:
            print("{}: Sending SIGINT to slave.".format(datetime.now()))
            slave_proc.send_signal(signal.SIGINT)
            slave_proc.wait(30)
            print("{}: Handler wait completed.".format(datetime.now()))
        except subprocess.TimeoutExpired:
            slave_proc.terminate()
        except Exception as exception:
            print('Sigterm Exception: {}'.format(exception))
            slave_proc.terminate()
            slave_proc.send_signal(signal.SIGKILL)


def main(wait_val=None):
    with open("stdout.txt", 'w+') as stdout:
        with open("stderr.txt", 'w+') as stderr:
            proc = subprocess.Popen(["python", "wait_test.py", "slave"],
                                    stdout=stdout,
                                    stderr=stderr,
                                    universal_newlines=True)

    print('Slave Started')

    global slave_proc
    slave_proc = proc

    try:
        proc.wait(wait_val)    # If this is a no-timeout wait, ie: wait(None), then will hang in sigterm_handler.
        print('Slave Finished by itself.')
    except subprocess.TimeoutExpired as te:
        print(te)
        print('Slave finished by timeout')
        proc.send_signal(signal.SIGINT)
        proc.wait()

    print("Job completed")


if __name__ == '__main__':
    if len(sys.argv) > 1 and sys.argv[1] == 'slave':
        while True:
            pass

    signal.signal(signal.SIGTERM, sigterm_handler)
    main(int(sys.argv[1]) if len(sys.argv) > 1 else None)
    print("{}: Exiting main.".format(datetime.now()))

以下是两次运行的示例:

Note here the 30 second delay
--------------------------------
[mkurtz@localhost testing]$ python wait_test.py
Slave Started
Process received SIGTERM signal 15 while processing job!
slave_proc is <subprocess.Popen object at 0x7f79b50e8d90>
2022-03-30 11:08:15.526319: Sending SIGINT to slave.   <--- 11:08:15
Slave Finished by itself.
Job completed
2022-03-30 11:08:45.526942: Exiting main.              <--- 11:08:45


Note here the instantaneous shutdown
-------------------------------------
[mkurtz@localhost testing]$ python wait_test.py 100
Slave Started
Process received SIGTERM signal 15 while processing job!
slave_proc is <subprocess.Popen object at 0x7fa2412a2dd0>
2022-03-30 11:10:03.649931: Sending SIGINT to slave.   <--- 11:10:03.649
2022-03-30 11:10:03.653170: Handler wait completed.    <--- 11:10:03.653
Slave Finished by itself.
Job completed
2022-03-30 11:10:03.673234: Exiting main.              <--- 11:10:03.673

这些特定测试是在 CentOS 7 上使用 Python 3.7.9 运行的。 有人可以解释这种行为吗?

最佳答案

Popen 类有一个 internal lock for wait operations :

        # Held while anything is calling waitpid before returncode has been
        # updated to prevent clobbering returncode if wait() or poll() are
        # called from multiple threads at once.  After acquiring the lock,
        # code must re-check self.returncode to see if another thread just
        # finished a waitpid() call.
        self._waitpid_lock = threading.Lock()

wait() and wait(timeout=...) 之间的主要区别前者在持有锁的同时无限期等待,而后者是一个繁忙的循环,在每次迭代时释放锁

            if timeout is not None:
                ...
                while True:
                    if self._waitpid_lock.acquire(False):
                        try:
                            ...
                            # wait without any delay
                            (pid, sts) = self._try_wait(os.WNOHANG)
                            ...
                        finally:
                            self._waitpid_lock.release()
                    ...
                    time.sleep(delay)
            else:
                while self.returncode is None:
                    with self._waitpid_lock:  # acquire lock unconditionally
                        ...
                        # wait indefinitley
                        (pid, sts) = self._try_wait(0)

这对于常规并发代码(即线程)来说不是问题,因为运行wait()并持有锁的线程将在以下情况下立即被唤醒:子进程结束。这反过来又允许等待锁/子进程的所有其他线程迅速继续。


但是,当 a)线程在 wait() 中持有锁并且 b)信号处理程序尝试等待。信号处理程序的一个微妙之处是它们会中断主线程:

signal: Signals and Threads

Python signal handlers are always executed in the main Python thread of the main interpreter, even if the signal was received in another thread. […]

由于信号处理程序在主线程中运行,因此主线程的常规代码执行将暂停,直到信号处理程序完成为止!

通过在信号处理程序中运行 wait,a) 信号处理程序会阻塞等待锁,b) 锁会阻塞等待信号处理程序。只有当信号处理程序 wait 超时时,“主线程”才会恢复,接收 suprocess 完成的确认,设置返回码并释放锁。

关于python - 如果无限 wait() 已经启动,信号处理程序会卡在 Popen.wait(timeout) 中,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/71682249/

相关文章:

python - PiCamera 流媒体

python - Unicode解码错误: 'cp932' codec can't decode byte 0xfc

python - 使用 elastic4s 查询产生零结果

python - 打开 .exe 并通过 Python 子进程向其传递命令?

Python,确定Unix进程是否正在运行的正确方法是什么?

java - Android:WAITING()主线程,而对话框在单独的线程中输入

python - 与 Django 比较日期范围

Python 的子进程 "1>&2"和 stderr=STDOUT

java - 在 Android 中等待多个回调

c - 浮点异常C代码