mdanalysis - 如何使用 MDAnalysis 计算蛋白质的质心?

标签 mdanalysis

我的处境有点不寻常。根据残基名称,有七种不同的蛋白质存储在一个文件中。每种蛋白质具有不同的序列长度。现在我需要计算每个蛋白质的质心并生成时间序列数据。我知道如何处理单个蛋白质,但不知道如何处理多个蛋白质系统。对于单一蛋白质我可以这样做:

import MDAnalysis as mda
import numpy as np

u = mda.Universe('lp400start.gro')
u1 = mda.Merge(u.select_atoms("not resname W and not resname WF and not resname ION"))
u1.load_new('lp400.xtc')
protein = u1.select_atoms("protein")
arr = np.empty((protein.n_residues, u1.trajectory.n_frames, 3))

for ts in u.trajectory:
    arr[:, ts.frame] = protein.center_of_mass(compound='residues')

数据文件公开 here 。可以使用 grep "^ *1[A-Z]" -B1 lp400final.gro | grep -v "^ *1[A-Z]" 检查拓扑文件中的残基序列。 ,输出:

 38ALA     BB   74  52.901  33.911   6.318
--
   38ALA     BB  148  41.842  29.381   7.211
--
  137GLY     BB  455  36.756   4.287   3.284
--
  137GLY     BB  762  44.721  60.377   3.112
--
  252HIS    SC3 1368  28.682  37.936   6.727
--
  252HIS    SC3 1974  18.533  46.506   6.314
--
  576PHE    SC3 3263  48.937  38.538   4.013
--
  576PHE    SC3 4552  18.513  25.948   3.800
--
 1092PRO    SC1 6470  42.510  40.992   6.775
--
 1092PRO    SC1 8388  14.709   4.759   6.370
--
 1016LEU    SC110524  57.264  56.308   2.632
--
 1016LEU    SC112660  50.716  14.698   2.728
--
 1285LYS    SC215345   0.793  33.529   1.509

第一个蛋白质的序列长度为 38 个残基,并且有自己的副本,然后是第二个蛋白质,依此类推。现在我想要获得每个蛋白质在每个时间范围内的 COM 并将其构建成一个时间序列。除了蛋白质之外,拓扑文件还包含 DPPC 颗粒。 Could someone help me how to do this?谢谢!

为了确保输出轨迹正确,它看起来像这样 enter link description here

最佳答案

我将从 TPR 文件加载系统以维护债券信息。然后 MDAnalysis 可以确定片段(即您的蛋白质)。然后循环片段以确定 COM 时间序列:

import MDAnalysis as mda
import numpy as np

# files from https://doi.org/10.5281/zenodo.846428
TPR = "lp400.tpr"
XTC = "lp400.xtc"

# build reduced universe to match XTC
# (ignore warnings that no coordinates are found for the TPR)
u0 = mda.Universe(TPR)
u = mda.Merge(u0.select_atoms("not resname W and not resname WF and not resname ION"))
u.load_new(XTC)

# segments (exclude the last one, which is DPPC and not protein)
protein_segments = u.segments[:-1]

# build the fragments
# (a dictionary with the key as the protein name -- I am using an
# OrderedDict so that the order is the same as in the TPR)
from collections import OrderedDict
protein_fragments = OrderedDict((seg.segid[6:], seg.atoms.fragments) for seg in protein_segments)

# analyze trajectory (with a nice progress bar)
timeseries = []
for ts in mda.log.ProgressBar(u.trajectory):
    coms = []
    for name, proteins in protein_fragments.items():
        # loop over all the different proteins;
        # unwrap to get the true COM under PBC (double check!!)
        coms.extend([p.center_of_mass(unwrap=True) for p in proteins]) 
    timeseries.append(coms)
timeseries = np.array(timeseries)

注意

  • 仔细检查 unwrap=True 是否正在做正确的事情(并且这是必要的 - 我没有检查是否有任何蛋白质跨越周期性边界 split )。
  • 解包速度很慢;如果你不需要它,它会运行得更快。
  • 生成的数组是形状为 (N_timesteps, M_ Proteins, 3) 的 3d 数组,即 (10001, 14, 3)
  • 蛋白质片段的内容是
    OrderedDict([('EPHA', (<AtomGroup with 74 atoms>, <AtomGroup with 74 atoms>)),
               ('OMPA', (<AtomGroup with 307 atoms>, <AtomGroup with 307 atoms>)),
               ('OMPG', (<AtomGroup with 606 atoms>, <AtomGroup with 606 atoms>)),
               ('BTUB', (<AtomGroup with 1289 atoms>, <AtomGroup with 1289 atoms>)),
               ('ATPS', (<AtomGroup with 1918 atoms>, <AtomGroup with 1918 atoms>)),
               ('GLPF', (<AtomGroup with 2136 atoms>, <AtomGroup with 2136 atoms>)),
               ('FOCA', (<AtomGroup with 2685 atoms>, <AtomGroup with 2685 atoms>))])
    

关于mdanalysis - 如何使用 MDAnalysis 计算蛋白质的质心?,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/71295833/

相关文章:

python - 如何高效迭代MDAnalysis轨迹并保存残差属性时间序列?

python - 使用MDA分析的PCA(python3.7)

python - 删除某个版本的 python 包

python - 如何使用 MDAnalysis 分析一组原子的principal_axes 和 moment_of_inertia?

python - 使用 MDAnalysis 获取径向分布函数

python - 如何使用 MDAnalysis 实例化 Atom?