我有一个无限期生成输出的程序。我想对输出进行一秒钟的采样,然后通过管道传输到 gzip。我正在使用 timeout
util 来限制执行但问题是 gzip
也会被杀死。
例如。:
$ /usr/bin/timeout 1 bash -c "echo asdf; sleep 5" | gzip > /tmp/foo.gz; ls -lah /tmp/foo.gz
Terminated
-rw-rw-r-- 1 haizaar haizaar 0 Jul 22 15:05 /tmp/foo.gz
你看,gzip 命令是 Terminated
因此它的输出导致一个空文件(由于丢失的缓冲区)我不明白如何
timeout
设法杀死读取其标准输出的进程;以及如何修复它。 甚至将整个内容包装在另一个
bash
中结果相同:$ bash -c '/usr/bin/timeout 1 bash -c "echo asdf; sleep 5"' | gzip > /tmp/foo.gz; ls -lah /tmp/foo.gz
Terminated
-rw-rw-r-- 1 haizaar haizaar 0 Jul 22 15:30 /tmp/foo.gz
我可以添加 timeout
与 setsid
然后它起作用了,这让我认为这与混淆的进程组有某种关系,但很难接受当前情况是“设计使然”的事实,因为它使 timeout
命令与 shell 管道一起使用非常棘手。环境:
$ cat /etc/lsb-release
DISTRIB_ID=Ubuntu
DISTRIB_RELEASE=20.04
DISTRIB_CODENAME=focal
DISTRIB_DESCRIPTION="Ubuntu 20.04.2 LTS"
$ bash --version
GNU bash, version 5.0.17(1)-release (x86_64-pc-linux-gnu)
Copyright (C) 2019 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software; you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.
$ timeout --version
timeout (GNU coreutils) 8.30
Copyright (C) 2018 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <https://gnu.org/licenses/gpl.html>.
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.
Written by Padraig Brady.
更新 KamilCuk 对他的 strace 分析很到位。它还解释了包装
timeout
在另一个 bash
也没有帮助 - 似乎 bash 有一个优化,如果它只有运行命令,它不会fork
s 而是 exec
s 自我替换。但是如果你在包装中添加另一个命令
bash
然后它会 fork ,从而创建一个新的进程组,从而限制 timeout
的爆炸半径命令。 IE。bash -c 'true; /usr/bin/timeout 1 bash -c "echo asdf; sleep 5"' | gzip > /tmp/foo.gz
(注意领先的 true
)我仍然认为使用
timeout
在管道中是一种黑魔法,但那是另一回事。
最佳答案
$ strace -ff -e trace=setpgid,kill,exit_group,exit,execve,wait4 bash --norc --noprofile -ic "timeout -v 1 bash --norc --noprofile -c 'echo asdf ; sleep 5' | { sleep 2; echo 123; }"
execve("/usr/bin/bash", ["bash", "--norc", "--noprofile", "-ic", "timeout -v 1 bash --norc --nopro"...], 0x7ffeb8ef7ef8 /* 76 vars */) = 0
setpgid(0, 28995) = 0
strace: Process 28996 attached
[pid 28995] setpgid(28996, 28996) = 0
[pid 28996] setpgid(28996, 28996) = 0
strace: Process 28997 attached
[pid 28995] setpgid(28997, 28996) = 0
[pid 28995] wait4(-1, <unfinished ...>
[pid 28997] setpgid(28997, 28996) = 0
[pid 28996] execve("/usr/bin/timeout", ["timeout", "-v", "1", "bash", "--norc", "--noprofile", "-c", "echo asdf ; sleep 5"], 0x560da0ff57e0 /* 76 vars */strace: Process 28998 attached
) = 0
[pid 28997] wait4(-1, <unfinished ...>
[pid 28998] execve("/usr/bin/sleep", ["sleep", "2"], 0x560da0ff57e0 /* 76 vars */) = 0
[pid 28996] setpgid(0, 0) = 0
strace: Process 28999 attached
[pid 28996] wait4(28999, 0x7ffd7eb5e96c, WNOHANG, NULL) = 0
[pid 28999] execve("/usr/local/bin/bash", ["bash", "--norc", "--noprofile", "-c", "echo asdf ; sleep 5"], 0x7ffd7eb5ec10 /* 76 vars */) = -1 ENOENT (No such file or directory)
[pid 28999] execve("/usr/bin/bash", ["bash", "--norc", "--noprofile", "-c", "echo asdf ; sleep 5"], 0x7ffd7eb5ec10 /* 76 vars */) = 0
[pid 28999] execve("/usr/bin/sleep", ["sleep", "5"], 0x55a84be27270 /* 76 vars */) = 0
[pid 28996] --- SIGALRM {si_signo=SIGALRM, si_code=SI_TIMER, si_timerid=0, si_overrun=0, si_int=0, si_ptr=NULL} ---
timeout: sending signal TERM to command ‘bash’
[pid 28996] kill(28999, SIGTERM) = 0
[pid 28999] --- SIGTERM {si_signo=SIGTERM, si_code=SI_USER, si_pid=28996, si_uid=1000} ---
[pid 28996] kill(0, SIGTERM <unfinished ...>
[pid 28997] <... wait4 resumed>0x7ffc114a9600, 0, NULL) = ? ERESTARTSYS (To be restarted if SA_RESTART is set)
[pid 28996] <... kill resumed>) = 0
[pid 28999] +++ killed by SIGTERM +++
[pid 28998] --- SIGTERM {si_signo=SIGTERM, si_code=SI_USER, si_pid=28996, si_uid=1000} ---
[pid 28997] --- SIGTERM {si_signo=SIGTERM, si_code=SI_USER, si_pid=28996, si_uid=1000} ---
[pid 28996] --- SIGTERM {si_signo=SIGTERM, si_code=SI_USER, si_pid=28996, si_uid=1000} ---
[pid 28996] --- SIGCHLD {si_signo=SIGCHLD, si_code=CLD_KILLED, si_pid=28999, si_uid=1000, si_status=SIGTERM, si_utime=0, si_stime=0} ---
[pid 28997] +++ killed by SIGTERM +++
[pid 28995] <... wait4 resumed>[{WIFSIGNALED(s) && WTERMSIG(s) == SIGTERM}], WSTOPPED|WCONTINUED, NULL) = 28997
[pid 28998] +++ killed by SIGTERM +++
[pid 28995] wait4(-1, <unfinished ...>
[pid 28996] kill(28999, SIGCONT) = 0
[pid 28996] kill(0, SIGCONT) = 0
[pid 28996] --- SIGCONT {si_signo=SIGCONT, si_code=SI_USER, si_pid=28996, si_uid=1000} ---
[pid 28996] wait4(28999, [{WIFSIGNALED(s) && WTERMSIG(s) == SIGTERM}], WNOHANG, NULL) = 28999
[pid 28996] exit_group(124) = ?
[pid 28996] +++ exited with 124 +++
<... wait4 resumed>[{WIFEXITED(s) && WEXITSTATUS(s) == 124}], WSTOPPED|WCONTINUED, NULL) = 28996
Terminated
--- SIGCHLD {si_signo=SIGCHLD, si_code=CLD_KILLED, si_pid=28997, si_uid=1000, si_status=SIGTERM, si_utime=0, si_stime=0} ---
wait4(-1, 0x7ffc114a9710, WNOHANG|WSTOPPED|WCONTINUED, NULL) = -1 ECHILD (No child processes)
setpgid(0, 28992) = 0
exit_group(143) = ?
+++ exited with 143 +++
所以发生的事情是 timeout
试图变得聪明并杀死整个进程组。据我了解,情况是这样的:setpgid(28996, 28996)
setpgid(0, 0)
timeout
杀死整个进程组 kill(0, SIGTERM <unfinished ...>
您可以使用命令 grouping
{ ... }
使 bash 为左侧启动一个新的进程组。您可以使用
timeout --foreground
,但是 timeout
只会杀死前台进程。所以虽然 bash
会死,gzip
进程仍然会等待 sleep 5
在后台运行,因为它会打开 stdin
给它。猜测(也来自 commit message )我认为这可能是意图,以便
timeout
可以杀死整个管道,就像它是内置的魔法 shell 一样。此外,启用和禁用作业控制之间的行为不同,因此交互式和非交互式 shell 之间的行为也不同。
关于linux - 为什么/usr/bin/timeout 会杀死整个管道?,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/68479811/