python - 如何减小 5 层 for 循环的大小

标签 python optimization refactoring

我开发了一个将 RNA 序列翻译为肽的代码,我需要减少其缩进级别以减少空间并提高可读性

翻译的生物学概念基本上在于读取字母序列(通常是 RNA),3×3,并根据表格为每个三联体分配一个氨基酸。

获取序列并将其拆分为三元组的过程做得很好并且已重构,

seq_codons = [sequence[i:i+3] for i in range((-1 + frame), len(sequence), 3)]

但剩下的只是一个巨大的字典和一个可笑的 5 层 for 循环,虽然可以工作,但还远未优化。

完整代码如下:

sequence = 'ACUGAUCUGAGACGUCAUCGUAGCAUCGU'


def translation(sequence, frame=1):  # Here, the frame just means from where starts
    codons_table = {                 # to count the triplets: A, C or U, in the exemple
        "CYS": ("UGU", "UGC",),
        "GLN": ("CAA", "CAG",),
        "GLU": ("GAA", "GAG",),
        "GLY": ("GGU", "GGC", "GGA", "GGG",),
        "HIS": ("CAU", "CAC",),
        "ILE": ("AUU", "AUC", "AUA",),
        "LEU": ("UUA", "UUG", "CUU", "CUC", "CUA", "CUG",),
        "LYS": ("AAA", "AAG",),
        "MET": ("AUG",),
        "PHE": ("UUU", "UUC",),
        "PRO": ("CCU", "CCC", "CCA", "CCG",),
        "SER": ("UCU", "UCC", "UCA", "UCG", "AGU", "AGC",),
        "THR": ("ACU", "ACC", "ACA", "ACG",),
        "TRP": ("UGG",),
        "TYR": ("UAU", "UAC",),
        "VAL": ("GUU", "GUC", "GUA", "GUG",),
        "STOP": ("UAG", "UGA", "UAA",),
        "ASP": ("GAU", "GAC",),
        "ASN": ("AAU", "AAC",),
        "ARG": ("CGU", "CGC", "CGA", "CGG", "AGA", "AGG",),
        "ALA": ("GCU", "GCC", "GCA", "GCG",)
    }
    seq_codons = [sequence[i:i+3] for i in range((-1 + frame), len(sequence), 3)]
    print(seq_codons)
    peptide = []

    for codon in seq_codons:
        for amino_acid, table_codon in zip(codons_table, codons_table.values()):
            if len(table_codon) > 1:
                for single_codon in table_codon:
                    if single_codon == codon:
                        peptide.append(amino_acid)
                    else:
                        pass
            else:
                if table_codon[0] == codon:
                    peptide.append(amino_acid)
                else:
                    pass

    return peptide

print(translation(sequence))

我想知道是否有一种方法可以减少最后一个 for 循环的大小,以及是否有更好的方法来存储数据,而不是使用字典

最佳答案

我建议以这种方式重新映射 codons_table,这样您就可以直接访问(打印 codons_map 以了解我的意思):

codons_map = {}
for k, v in codons_table.items():
  for item in v:
    codons_map[item] = k

然后,就像您将字符串分成三份一样:

sequence = 'ACUGAUCUGAGACGUCAUCGUAGCAUCGU'
seq_codons = [sequence[i:i+3] for i in range(0, len(sequence), 3)]

最后迭代seq_codons:

peptide = []
for item in seq_codons:
  if len(item) == 3:
    peptide.append(codons_map[item])

print(peptide)
#=> ['THR', 'ASP', 'LEU', 'ARG', 'ARG', 'HIS', 'ARG', 'SER', 'ILE']

<小时/> 短途

codons_map = { item: k for k, v in codons_table.items() for item in v }
seq_codons = [sequence[i:i+3] for i in range(0, len(sequence), 3)]
peptide = [ codons_map[item] for item in seq_codons if len(item) == 3 ]

print(peptide)

#=> ['THR', 'ASP', 'LEU', 'ARG', 'ARG', 'HIS', 'ARG', 'SER', 'ILE']

关于python - 如何减小 5 层 for 循环的大小,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/53889519/

相关文章:

c# - 我可以阻止 CLR 优化掉调试信息吗?

c# - 如何简化在主窗体和子窗体之间传递的设置值的代码

java - 如何命名一个公开序列化和反序列化/编码解码方法的类

python - cython 可以按原样编译所有 python 代码吗?

python - n维点集凸包的顶点

python - 如何从 pylint 获取错误列表?

algorithm - 优化算法 : Fastest Way to Derive Sets

python - Azure Batch Linux vm运行脚本

c - 数字的安全 gcc 优化选项

c# - 将两种方法重构为一种