正则表达式查找连续的单词

<分区>

我想显示和比较 Bash 中单词“The”之后出现的所有单词。

示例:

The next generation will be ruled by the smartphones. The next thing is interesting to watch.The question is how do we solve this problem

所以预期的输出是:

next                   2

smartphone             1

question               1

以下是我试过的命令:

cat file.txt | tr A-Z a-z |grep 'the '  | cut -d\  -f2| sort |uniq -c|sort -nr

但是这个命令并没有给我一个不准确的结果。它给我输出了单词“the”之后实际上不存在的单词

最佳答案

使用 GNU grep:

grep -Poi 'the \K\w.*?\b' file | sort | uniq -c | awk '{print $2,$1}'

或

grep -Poi 'the \K\w.*?\b' file | awk '{count[$1]++}END{for(j in count) print j, count[j]}'

输出:

next 2
question 1
smartphones 1

关于正则表达式查找连续的单词，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/49086817/

相关文章：

c - 如何将 GUI 合并到用 KDevelop 编写的 C 代码中？