pdb - 合并两个 PDB 链时从 PDB 文件中删除 'TER' 关键字

标签 pdb biopython

目标:应使用 Biopython 合并 PDB 中的两条链。在下面的示例中,我想将两条链 A 和 B 合并到 C 中。

ATOM   1133  N   VAL A 100      12.484 -30.583 106.831  1.00 30.28           N
ATOM   1134  CA  VAL A 100      11.430 -31.194 106.033  1.00 34.41           C
ATOM   1135  C   VAL A 100      11.985 -32.402 105.259  1.00 39.25           C
ATOM   1136  O   VAL A 100      11.248 -33.126 104.568  1.00 46.37           O
ATOM   1137  CB  VAL A 100      10.822 -30.174 105.029  1.00 35.16           C
ATOM   1138  CG1 VAL A 100      10.159 -29.020 105.767  1.00 36.95           C
ATOM   1139  CG2 VAL A 100      11.865 -29.669 104.007  1.00 30.60           C
TER
ATOM   1141  N   GLU B   1      12.344 -43.792 102.987  1.00 64.25           N
ATOM   1142  CA  GLU B   1      11.253 -42.785 103.240  1.00 66.15           C
ATOM   1143  C   GLU B   1      11.742 -41.350 102.948  1.00 65.40           C
ATOM   1144  O   GLU B   1      12.011 -40.595 103.895  1.00 65.31           O
ATOM   1145  CB  GLU B   1      10.779 -42.877 104.712  1.00 67.04           C

这几行代码可以将它们合并成单链,但无法删除TER关键字。

merged_chains=['A', 'B']
new_rsd_num = 1
for model in structure:
  for chain in model:
    if chain.id in merged_chains:
      chain.id = 'C'
      for residue in chain:
        residue.id = (' ', new_rsd_num, ' ')
        new_rsd_num += 1

这组代码产生以下输出,其中在两个链之间包含 TER 关键字。

...
ATOM   1133  N   VAL C 100      12.484 -30.583 106.831  1.00 30.28           N
ATOM   1134  CA  VAL C 100      11.430 -31.194 106.033  1.00 34.41           C
ATOM   1135  C   VAL C 100      11.985 -32.402 105.259  1.00 39.25           C
ATOM   1136  O   VAL C 100      11.248 -33.126 104.568  1.00 46.37           O
ATOM   1137  CB  VAL C 100      10.822 -30.174 105.029  1.00 35.16           C
ATOM   1138  CG1 VAL C 100      10.159 -29.020 105.767  1.00 36.95           C
ATOM   1139  CG2 VAL C 100      11.865 -29.669 104.007  1.00 30.60           C
TER
ATOM   1141  N   GLU C 101      12.344 -43.792 102.987  1.00 64.25           N
ATOM   1142  CA  GLU C 101      11.253 -42.785 103.240  1.00 66.15           C
ATOM   1143  C   GLU C 101      11.742 -41.350 102.948  1.00 65.40           C
ATOM   1144  O   GLU C 101      12.011 -40.595 103.895  1.00 65.31           O
ATOM   1145  CB  GLU C 101      10.779 -42.877 104.712  1.00 67.04           C
...

但输出应遵循其中应删除 TER 关键字的输出。

...
ATOM   1133  N   VAL C 100      12.484 -30.583 106.831  1.00 30.28           N
ATOM   1134  CA  VAL C 100      11.430 -31.194 106.033  1.00 34.41           C
ATOM   1135  C   VAL C 100      11.985 -32.402 105.259  1.00 39.25           C
ATOM   1136  O   VAL C 100      11.248 -33.126 104.568  1.00 46.37           O
ATOM   1137  CB  VAL C 100      10.822 -30.174 105.029  1.00 35.16           C
ATOM   1138  CG1 VAL C 100      10.159 -29.020 105.767  1.00 36.95           C
ATOM   1139  CG2 VAL C 100      11.865 -29.669 104.007  1.00 30.60           C
ATOM   1141  N   GLU C 101      12.344 -43.792 102.987  1.00 64.25           N
ATOM   1142  CA  GLU C 101      11.253 -42.785 103.240  1.00 66.15           C
ATOM   1143  C   GLU C 101      11.742 -41.350 102.948  1.00 65.40           C
ATOM   1144  O   GLU C 101      12.011 -40.595 103.895  1.00 65.31           O
ATOM   1145  CB  GLU C 101      10.779 -42.877 104.712  1.00 67.04           C
...

有什么想法可以使用 BioPython 删除 TER 关键字吗?

最佳答案

残基仍然属于链对象,即当覆盖id时,属于链A的残基数量不会改变。

您可以将链 B 的残基添加到链 A,然后删除链 B。

#read a PDB file with two chains
from Bio import PDB
pdbl = PDB.PDBList()
pdbl.retrieve_pdb_file('5K04')
parser = PDB.PDBParser()
structure = parser.get_structure('5K04', pdbl.local_pdb + '/k0/pdb5k04.ent')

#get all chains
chains = list()
for model in structure:
  for chain in model:
    chains.append(chain)

#get the id of the last residue in the first chain
len_chain_a = int(chains[0].get_unpacked_list()[-1].id[1]) + 1

#get all residues from the 2nd chain
for i, residue in enumerate(chains[1].get_residues()):
    old_id = list(residue.id)
    old_id[1] = len_chain_a + i
    #increment the id
    residue.id = tuple(old_id)
    #add the residue to the first chain
    chains[0].add(residue)

#now delete all chains but the first
for model in structure:
    for chain in model:
        if chain.id != 'A':
            model.detach_child(chain.id)

#save the merged chains
pdb_io = PDB.PDBIO()
pdb_io.set_structure(structure)
pdb_io.save('5k04_merged.pdb')

关于pdb - 合并两个 PDB 链时从 PDB 文件中删除 'TER' 关键字,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/43211047/

相关文章:

Emacs gud 提升前缀键错误

python - 如何从 python 中的多个登录号返回来自 ncbi 的相应 fasta 蛋白质序列?

python - 通过 echo 管道将 python 变量(字符串)传递给 bash 命令

python - 通过访问 Uniprot 获取蛋白质序列(使用 Python)

c# - Visual Studio 2010 中的 "Browse To Find Source"

python - 在 Emacs 中获取 pdb 以使用当前 virtualenv 中的 Python 进程

python - 如何操作名称与 PDB 命令冲突的变量?

python - XLRD/Entrez : Search through Pubmed and extract the counts

python - 在 Biopython 的 PDB 模块中实现等效性