linux - 从 pdb 轨迹中提取每个文件

标签 linux bash awk protein-database

我有一个 pdb 文件,它代表文件的轨迹

REMARK    GENERATED BY TRJCONV
TITLE     Protein in water t= 400.00000
REMARK    THIS IS A SIMULATION BOX
CRYST1   99.547   99.547   99.547  90.00  90.00  90.00 P 1           1
MODEL        1
ATOM      1  N   PRO A   1      46.850  67.380  57.030  1.00  0.00
ATOM      2  H1  PRO A   1      46.230  66.770  56.500  1.00  0.00
ATOM      3  H2  PRO A   1      46.420  68.290  56.940  1.00  0.00
ATOM      4  CD  PRO A   1      47.060  66.780  58.360  1.00  0.00
TER
ENDMDL
REMARK    GENERATED BY TRJCONV
TITLE     Protein in water t= 800.00000
REMARK    THIS IS A SIMULATION BOX
MODEL       10
ATOM      1  N   PRO A   1      46.850  67.380  57.030  1.00  0.00
ATOM      2  H1  PRO A   1      46.230  66.770  56.500  1.00  0.00
ATOM      3  H2  PRO A   1      46.420  68.290  56.940  1.00  0.00
ATOM      4  CD  PRO A   1      47.060  66.780  58.360  1.00  0.00
TER
ENDMDL
REMARK    GENERATED BY TRJCONV
TITLE     Protein in water t= 1200.00000
REMARK    THIS IS A SIMULATION BOX
MODEL      100
ATOM      1  N   PRO A   1      46.850  67.380  57.030  1.00  0.00
ATOM      2  H1  PRO A   1      46.230  66.770  56.500  1.00  0.00
ATOM      3  H2  PRO A   1      46.420  68.290  56.940  1.00  0.00
ATOM      4  CD  PRO A   1      47.060  66.780  58.360  1.00  0.00
TER
ENDMDL

我要打印信息

MODEL        1
[all info]
TER
ENDMDL

适用于所有型号。并保留文件的格式。我试过这个

awk '/MODEL        1/,/ENDMDL/'  test.pdb

但是我的文件太大了,无法手动完成。我想将每个模型保存为 model1、model2 等及其坐标信息,直到 ENDMDL

最佳答案

$ awk '/MODEL/{f="model" $2 ".pdb"} f{print > f} /ENDMDL/ {close(f);f=""}' file
$ cat model1.pdb
MODEL        1
ATOM      1  N   PRO A   1      46.850  67.380  57.030  1.00  0.00
ATOM      2  H1  PRO A   1      46.230  66.770  56.500  1.00  0.00
ATOM      3  H2  PRO A   1      46.420  68.290  56.940  1.00  0.00
ATOM      4  CD  PRO A   1      47.060  66.780  58.360  1.00  0.00
TER
ENDMDL

解释:

/MODEL/ {                 # @ MODEL
    f="model" $2 ".pdb"   # use f as flag and target filename
} 
f {                       # when there is an f
    print > f             # output to file in f
} 
/ENDMDL/ {                # at the ENDMDL
    close(f)              # close the file
    f=""                  # unset f
}

关于linux - 从 pdb 轨迹中提取每个文件,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/42855912/

相关文章:

linux - 在 vagrant box 中启动一个在 screen 上运行的服务器

c++ - Linux : How to detect a process that consumes the maximum memory and kill it?

linux -/proc/net/dev 单位转换

使用 djvused 程序的 linux shell 脚本不工作

linux - bash 迭代打印一行中的第一列到第 n 列

bash - 用 awk 对重复的行值求和

c++ - 此应用程序无法启动,因为它无法在 "xcb"中找到或加载 Qt 平台插件 ""。并且找不到/usr/lib/qt5

linux - Bash:在多个内核上运行相同的程序

linux - 使用 for 循环设置变量

使用 awk 解析/proc/net/dev 的输出并忽略前两行