bash - 为什么 grep 相同的联机帮助页有时会导致错误?

标签 bash shell curl grep manpage

完全相同的命令:

man curl | grep -Pzo 'EXIT CODES(.|\n)*AUTHORS' | grep '  6  '

有时 给出预期的输出:

       6      Couldn't resolve host. The given remote host was not resolved.

并且有时会给出错误:

Binary file (standard input) matches

例如:

$ man curl | grep -Pzo 'EXIT CODES(.|\n)*AUTHORS' | grep '  6  '
       6      Couldn't resolve host. The given remote host was not resolved.

$ man curl | grep -Pzo 'EXIT CODES(.|\n)*AUTHORS' | grep '  6  '
Binary file (standard input) matches

$ man curl | grep -Pzo 'EXIT CODES(.|\n)*AUTHORS' | grep '  6  '
Binary file (standard input) matches

$ man curl | grep -Pzo 'EXIT CODES(.|\n)*AUTHORS' | grep '  6  '
       6      Couldn't resolve host. The given remote host was not resolved.

相关包的版本:

$ lsb_release -a
No LSB modules are available.
Distributor ID: Ubuntu
Description:    Ubuntu 16.04.1 LTS
Release:        16.04
Codename:       xenial

$ grep --version
grep (GNU grep) 2.25
Copyright (C) 2016 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>.
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.

Written by Mike Haertel and others, see <http://git.sv.gnu.org/cgit/grep.git/tree/AUTHORS>.

$ man --version
man 2.7.5

$ curl --version
curl 7.47.0 (x86_64-pc-linux-gnu) libcurl/7.47.0 GnuTLS/3.4.10 zlib/1.2.8 libidn/1.32 librtmp/2.3
Protocols: dict file ftp ftps gopher http https imap imaps ldap ldaps pop3 pop3s rtmp rtsp smb smbs smtp smtps telnet tftp
Features: AsynchDNS IDN IPv6 Largefile GSS-API Kerberos SPNEGO NTLM NTLM_WB SSL libz TLS-SRP UnixSockets

我真的在为这个问题挠头。

我通过将 -a 标志放入我的 greps 中解决了我的问题:man curl | grep -Pzoa '退出代码(.|\n)*作者' | grep -a '6'

但我真的很困惑为什么它有时只会出错?...

最佳答案

因为使用了 -z 选项,所以第一个 grep 将 NUL 字符附加到输出的末尾。接下来会发生什么取决于缓冲的变幻莫测。如果第二个 grep 在分析文件之前看到 NUL,则它确定该文件是二进制文件。如果没有,它会找到您想要的匹配项。

所以,这恰好对我有用:

$ man curl | grep -Pzo 'EXIT CODES(.|\n)*AUTHORS' | grep '  6  '
       6      Couldn't resolve host. The given remote host was not resolved.

但是,如果我将第一个 grep 的输出放在一个临时文件中并要求第二个 grep 读取它,那么第二个 grep 总是会提示输入是二进制的:

$ man curl | grep -Pzo 'EXIT CODES(.|\n)*AUTHORS' >tmpfile;  grep '  6  ' tmpfile
Binary file tmpfile matches

替代方案:使用 awk

避免 NUL 字符问题以及减少所需进程数的一种方法是使用 awk:

$ man curl | awk '/EXIT CODES/,/AUTHORS/{if (/   6   /) print}'
       6      Couldn't resolve host. The given remote host was not resolved.

替代方案:使用 sed

$ man curl | sed -n '/EXIT CODES/,/AUTHORS/{/   6   /p}'
       6      Couldn't resolve host. The given remote host was not resolved.

备选方案:使用 greps 和 tr

作为tripleee建议,另一种选择是使用 tr 将 NUL 替换为换行符:

$ man curl | grep -Pzo 'EXIT CODES(.|\n)*AUTHORS' | tr '\000' '\n' | grep '  6  '
       6      Couldn't resolve host. The given remote host was not resolved.

关于bash - 为什么 grep 相同的联机帮助页有时会导致错误?,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/47688449/

相关文章:

linux - 使用shell脚本按两列之和对文件进行排序

bash - ln 提示没有这样的文件或目录

linux - Init.d 脚本不启动或停止只打印帮助信息

curl - Heroku SSL 问题给我验证失败

ruby-on-rails - request.format 返回 */*

linux - 在 linux bash 中有一种方法可以在不使用 xte 或其他类似程序的情况下模拟 y 按键

python - 在 Pycharm 2016.3 中安装 pip

python - 如何让我的程序利用制表符补全?

linux - 如何计算列的平均值

linux - 如何通过本地代理允许 curl ?