linux - 如何计算Shell中重复的句子

标签 linux shell awk sed

cat file1.txt
abc bcd abc ...
abcd bcde cdef ...
abcd bcde cdef ...
abcd bcde cdef ...
efg fgh ...
efg fgh ...
hig ...

我的预期结果如下:

abc bcd abc ...      

abcd bcde cdef ...  
<!!! pay attention, above sentence has repeated 3 times !!!>

efg fgh ...
<!!! pay attention, above sentence has repeated 3 times !!!>

hig ...

我找到了处理这些问题的方法，但我的代码有点嘈杂。

cat file1.txt | uniq -c | sed -e 's/ \+/ /g' -e 's/^.//g' | awk '{print $0," ",$1}'| sed -e 's/^[2-9] /\n/g' -e 's/^[1] //g' |sed -e 's/[^1]$/\n<!!! pay attention, above sentence has repeated & times !!!> \n/g' -e 's/[1]$//g'

abc bcd abc ...

abcd bcde cdef ...
<!!! pay attention, above sentence has repeated 3 times !!!>

efg fgh ...
<!!! pay attention, above sentence has repeated 2 times !!!>

hig ...

我想知道你是否可以告诉我更高效的方法来实现目标。非常感谢。

最佳答案

sort + uniq + sed解决方案:

sort file1.txt | uniq -c | sed -E 's/^ +1 (.+)/\1\n/; 
 s/^ +([2-9]|[0-9]{2,}) (.+)/\2\n<!!! pay attention, the above sentence has repeated \1 times !!!>\n/'

输出:

abc bcd abc ...

abcd bcde cdef ...
<!!! pay attention, the above sentence has repeated 3 times !!!>

efg fgh ...
<!!! pay attention, the above sentence has repeated 2 times !!!>

hig ...

或者使用 awk:

sort file1.txt | uniq -c | awk '{ n=$1; sub(/^ +[0-9]+ +/,""); 
printf "%s\n%s",$0,(n==1? ORS:"<!!! pay attention, the above sentence has repeated "n" times !!!>\n\n") }'

关于linux - 如何计算Shell中重复的句子，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/47884428/

上一篇：c - 'int (*enqueue)(struct , struct)'是什么意思

下一篇：linux - 生成文件 : No rule to make target

相关文章：

linux - 如何在不使用cron的情况下在linux上运行预定的脚本？

linux - Bash 脚本 : max, min,sum - 许多来源作为参数

Java使用sed运行shell命令mysql

linux 在字符串匹配之前插入文件内容

bash - 从 fasta 文件中删除多个序列

linux - 仅删除文件中数字之间的多个空格

linux - AWS亚马逊Linux EC2实例: apache user permission denied for write to directories

php - 尝试远程连接到 mysql 时出现错误 2003 (HY000)

linux - 通过从文本文件中读取文件名从目录中移动文件

linux - 如果多个线程 epoll 在同一个套接字上等待怎么办？