这个问题有点令人困惑,所以我只举个例子。
假设我有以下情况:
$ grep -P "locus_tag\tM715_1000193188" Genome.tbl -B1 -A8
193188 193066 gene
locus_tag M715_1000193188
193188 193066 mRNA
product hypothetical protein
protein_id gnl|CorradiLab|M715_1000193188
transcript_id gnl|CorradiLab|M715_mrna1000193188
193188 193066 CDS
product hypothetical protein
protein_id gnl|CorradiLab|M715_1000193188
transcript_id gnl|CorradiLab|M715_mrna1000193188
我想在“locus_tag M715_1000193188”后面的8行加上“#”,这样我修改后的文件就变成这样:
193188 193066 gene
locus_tag M715_1000193188
#193188 193066 mRNA
# product hypothetical protein
# protein_id gnl|CorradiLab|M715_1000193188
# transcript_id gnl|CorradiLab|M715_mrna1000193188
#193188 193066 CDS
# product hypothetical protein
# protein_id gnl|CorradiLab|M715_1000193188
# transcript_id gnl|CorradiLab|M715_mrna1000193188
基本上,我有一个包含约 3000 个不同基因座标签的文件,对于其中的 300 个,我需要注释掉 mRNA 和 CDS 特征,因此 locus_tag 行之后的 8 行。
有什么方法可以用 sed 做到这一点?文件中还有其他类型的信息需要保持不变。
谢谢, 阿德里安
最佳答案
如果你可以使用 awk
,应该这样做:
awk 'f&&f-- {$0="#"$0} /locus_tag/ {f=8} 1' file
193188 193066 gene
locus_tag M715_1000193188
#193188 193066 mRNA
# product hypothetical protein
# protein_id gnl|CorradiLab|M715_1000193188
# transcript_id gnl|CorradiLab|M715_mrna1000193188
#193188 193066 CDS
# product hypothetical protein
# protein_id gnl|CorradiLab|M715_1000193188
# transcript_id gnl|CorradiLab|M715_mrna1000193188
关于awk - 在匹配 STRING 的前 8 行前添加 "#",我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/29926593/