linux - 如何使用 bash 从大文件中提取所有以特定字符开头的单词？

我有一个非常大的文件，看起来像这样:

ENST00000629289"; transcript_version "2"; exon_number "22"; gene_name "CDK11B"; gene_source "ensembl_havana"; gene_biotype "protein_coding"; transcript_name "CDK11B-208"; transcript_source "ensembl"; transcript_biotype "protein_coding"; exon_id "ENSE00001594002"; exon_version "1"; tag "basic"; transcript_support_level "5";
ENST00000629289"; transcript_version "2"; exon_number "22"; gene_name "CDK11B"; gene_source "ensembl_havana"; gene_biotype "protein_coding"; transcript_name "CDK11B-208"; transcript_source "ensembl"; transcript_biotype "protein_coding"; exon_id "ENSE00001594002"; exon_version "1"; tag "basic"; transcript_support_level "5";
ENST00000629289"; transcript_version "2"; exon_number "22"; gene_name "CDK11B"; gene_source "ensembl_havana"; gene_biotype "protein_coding"; transcript_name "CDK11B-208"; transcript_source "ensembl"; transcript_biotype "protein_coding"; protein_id "ENSP00000485937"; protein_version "1"; tag "basic"; transcript_support_level "5";

我想提取所有以特定字符“ENST”开头的单词我尝试了以下命令:

 sed 's/.*\(ENST.*transcript_version\)/\1/p'

但它会打印出所有行。有人可以帮我解决这个问题吗？

最佳答案

使用grep 和-o 选项只打印匹配的部分:

grep -Po '^ENST.*transcript_version' file

关于linux - 如何使用 bash 从大文件中提取所有以特定字符开头的单词？，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/44666619/

上一篇：c - Linux共享内存段错误

下一篇：linux - 为什么 x86 将参数放在堆栈上？

相关文章：

asp.net - 适用于 Linux 的单声道兼容 CMS

linux - Linux 命令 shell 中 "command > log.txt"和 "command 1>& log.txt"之间的区别？

python - 在 linux 中使用 python 匹配多行

linux - 从 Apache Nifi 中的 ExecuteProcess 调用远程 shell 脚本时，无法将环境变量作为敏感字段传递

linux - 如何在linux上监控各种进程

linux - 如何用特定列中的字符替换空格？

bash - 在每行末尾添加一个逗号

java - 我可以通过Java运行交互式脚本吗？

python - 从文件中随机抽取行

python - 从标准输入读取时如何处理索引