我有以下格式的数据集:
Identified_____ID#2357_____ReadSequence:1238
Unknown_____0_____ReadSequence:0979
Unknown_____0_____ReadSequence:5476
Identified_____ID#567899_____ReadSequence:4376
使用 awk
命令,如何提取 ReadSequences
但仅提取已识别的行(基于第一列条目)?
最佳答案
$ awk -F"_____" '$1=="Identified" {print $3}' test.in
ReadSequence:1238
ReadSequence:4376
如果您只想要 ReadSequence id,gsub
是您的 friend :
$ awk -F"_____" '$1=="Identified" {gsub(/^.*:/,"",$3); print $3}' test.in
1238
4376
关于linux - 从多列文件中提取行,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/38476272/