unix - grep 为何运行得这么快?

标签 unix grep

我对 shell 中 GREP 的功能感到非常惊讶,之前我在 java 中使用 substring 方法,但现在我使用 GREP,它在几秒钟内执行,它比我使用的 java 代码快得多写。(根据我的经验,我可能是错的)

话虽如此,我还是无法弄清楚这是怎么发生的?网络上也没有太多可用的内容。

有人可以帮我解决这个问题吗?

最佳答案

假设您的问题专门针对GNU grep。以下是作者 Mike Haertel 的注释:

GNU grep is fast because it AVOIDS LOOKING AT EVERY INPUT BYTE.

GNU grep is fast because it EXECUTES VERY FEW INSTRUCTIONS FOR EACH BYTE that it does look at.

GNU grep uses the well-known Boyer-Moore algorithm, which looks first for the final letter of the target string, and uses a lookup table to tell it how far ahead it can skip in the input whenever it finds a non-matching character.

GNU grep also unrolls the inner loop of Boyer-Moore, and sets up the Boyer-Moore delta table entries in such a way that it doesn't need to do the loop exit test at every unrolled step. The result of this is that, in the limit, GNU grep averages fewer than 3 x86 instructions executed for each input byte it actually looks at (and it skips many bytes entirely).

GNU grep uses raw Unix input system calls and avoids copying data after reading it. Moreover, GNU grep AVOIDS BREAKING THE INPUT INTO LINES. Looking for newlines would slow grep down by a factor of several times, because to find the newlines it would have to look at every byte!

So instead of using line-oriented input, GNU grep reads raw data into a large buffer, searches the buffer using Boyer-Moore, and only when it finds a match does it go and look for the bounding newlines (Certain command line options like -n disable this optimization.)

此答案是取自 here 的信息的子集。 。

关于unix - grep 为何运行得这么快?,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/12629749/

相关文章:

linux - 我可以使用 setlocale() 和 isalpha() 来确定字符是否属于当前语言环境的字母表吗?

json - 如果内容不存在则创建包含内容的文件

grep:上下文长度参数无效

perl - 使用逻辑或从 Perl 中的数组中删除项目

emacs eshell grep on grep 输出

perl - 如何返回 1 级存储在 Perl 变量中的目录路径

bash - 用文件中的多行替换一行

c - 等待所有子进程退出

unix - 操纵电子邮件地址的路径

unix - 计算文件中的空白行数