我正在尝试优化我的代码,因为当我尝试加载巨大的字典时,它变得非常慢。我认为这是因为它在字典中搜索键。我一直在阅读有关 python defaultdict
的内容,我认为这可能是一个很好的改进,但我未能在这里实现它。正如您所看到的,这是一个分层的字典结构。任何提示将不胜感激。
class Species:
'''This structure contains all the information needed for all genes.
One specie have several genes, one gene several proteins'''
def __init__(self, name):
self.name = name #name of the GENE
self.genes = {}
def addProtein(self, gene, protname, len):
#Converting a line from the input file into a protein and/or an exon
if gene in self.genes:
#Gene in the structure
self.genes[gene].proteins[protname] = Protein(protname, len)
self.genes[gene].updateProts()
else:
self.genes[gene] = Gene(gene)
self.updateNgenes()
self.genes[gene].proteins[protname] = Protein(protname, len)
self.genes[gene].updateProts()
def updateNgenes(self):
#Updating the number of genes
self.ngenes = len(self.genes.keys())
基因和蛋白质的定义是:
class Protein:
#The class protein contains information about the length of the protein and a list with it's exons (with it's own attributes)
def __init__(self, name, len):
self.name = name
self.len = len
class Gene:
#The class gene contains information about the gene and a dict with it's proteins (with it's own attributes)
def __init__(self, name):
self.name = name
self.proteins = {}
self.updateProts()
def updateProts(self):
#Update number of proteins
self.nproteins = len(self.proteins)
最佳答案
您不能使用 defaultdict
,因为您的 __init__
方法需要参数。
这可能是您的瓶颈之一:
def updateNgenes(self):
#Updating the number of genes
self.ngenes = len(self.genes.keys())
len(self.genes.keys())
在计算长度之前创建所有键的列表
。这意味着每次添加基因时,您都会创建一个列表并将其丢弃。你拥有的基因越多,这个列表的创建就会变得越来越昂贵。要避免创建中间列表,只需执行 len(self.genes)
。
更好的方法是将 ngenes
设为 property因此仅在您需要时才计算。
@property
def ngenes(self):
return len(self.genes)
使用 Gene
类中的 n Proteins
也可以完成同样的操作。
这是重构后的代码:
class Species:
'''This structure contains all the information needed for all genes.
One specie have several genes, one gene several proteins'''
def __init__(self, name):
self.name = name #name of the GENE
self.genes = {}
def addProtein(self, gene, protname, len):
#Converting a line from the input file into a protein and/or an exon
if gene not in self.genes:
self.genes[gene] = Gene(gene)
self.genes[gene].proteins[protname] = Protein(protname, len)
@property
def ngenes(self):
return len(self.genes)
class Protein:
#The class protein contains information about the length of the protein and a list with it's exons (with it's own attributes)
def __init__(self, name, len):
self.name = name
self.len = len
class Gene:
#The class gene contains information about the gene and a dict with it's proteins (with it's own attributes)
def __init__(self, name):
self.name = name
self.proteins = {}
@property
def nproteins(self):
return len(self.proteins)
关于python - 优化Python在层次字典中的键搜索,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/13630906/