python - Biopython:resseq 与 pdb 文件不匹配

标签 python bioinformatics biopython

我有一个 PDB 文件,我需要提取其残基序列号 (resseqs)。根据对 PDB 文件前几行的手动检查(粘贴在下面),我认为 resseq 应该是 [22, 23, ...]。然而,Biopython的Bio.PDB模块另有建议(输出也附在下面)。我想知道这是 Biopython 的错误还是我在理解 PDB 格式时遇到问题。

ATOM      1  N   GLY A  22      78.171  89.858  59.231  1.00 21.24           N  
ATOM      2  CA  GLY A  22      79.174  88.827  58.999  1.00 20.87           C  
ATOM      3  C   GLY A  22      80.438  89.415  58.391  1.00 21.89           C  
ATOM      4  O   GLY A  22      80.362  90.202  57.440  1.00 23.18           O  
ATOM      5  N   LEU A  23      81.588  89.069  58.972  1.00 21.51           N  
ATOM      6  CA  LEU A  23      82.895  89.555  58.527  1.00 20.80           C  
ATOM      7  C   LEU A  23      83.288  89.020  57.162  1.00 22.41           C  
ATOM      8  O   LEU A  23      82.889  87.923  56.788  1.00 22.93           O  
ATOM      9  CB  LEU A  23      83.973  89.232  59.560  1.00 20.97           C  
ATOM     10  CG  LEU A  23      84.225  87.818  60.062  1.00 13.32           C  
ATOM     11  CD1 LEU A  23      85.448  87.888  60.939  1.00 15.24           C  
ATOM     12  CD2 LEU A  23      83.035  87.258  60.829  1.00 12.21           C

我用来提取resseq的代码:

...
for i in chain:
    print i.get_full_id()

OUT:('pdb', 0, 'A', (' ', 2, ' '))
    ('pdb', 0, 'A', (' ', 3, ' '))
...

最佳答案

摘自Bio.PDB.Entity.get_full_id的文档

def get_full_id(self):
    """Return the full id.

    The full id is a tuple containing all id's starting from
    the top object (Structure) down to the current object. A full id for
    a Residue object e.g. is something like:

    ("1abc", 0, "A", (" ", 10, "A"))

    This corresponds to:

    Structure with id "1abc"
    Model with id 0
    Chain with id "A"
    Residue with id (" ", 10, "A")

    The Residue id indicates that the residue is not a hetero-residue
    (or a water) because it has a blank hetero field, that its sequence
    identifier is 10 and its insertion code "A".
    """
    # The function implementation below here ...

我假设您正在迭代链的原子而不是残基,这会为您提供每个Atom的完整id而不是Residue.

如果将示例残基保存在名为 struct.pdb 的文件中并运行下面的代码,您将获得正确的 id

>>> structure = PDBParser().get_structure('test', 'struct.pdb')
>>> for residue in structure.get_residues():
...    print(residue.get_full_id())
('test', 0, 'A', (' ', 22, ' '))
('test', 0, 'A', (' ', 23, ' '))
>>> resseqs = [residue.id[1] for residue in structure.get_residues()]
>>> print(resseqs)
[22, 23]

关于python - Biopython:resseq 与 pdb 文件不匹配,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/45466408/

相关文章:

r - 根据返回的结果和先前正则表达式的规则创建新正则表达式|索引正则表达式并查看正则表达式如何与子字符串匹配

python - For循环遍历Python中的一串字符串

python - 使用 Pysam 访问特定位置的 Bam 文件

python 在一个测试函数中测试代码覆盖率和多个断言

python - 如何在 Ubuntu 服务器上手动部署 FastAPI?

c++ - 创建给定变量集的所有可能排列

python - 在不创建序列文件的情况下运行 BLAST (bl2seq)

python - 使用 Python 检索丢失的序列 -'split' 命令不起作用

python - 将颜色条与辅助 y 轴一起使用

python - 像表格一样对 numpy 数组进行排序