list - 尝试从 Biopython 获取分类信息

标签 list loops iteration bioinformatics biopython

我正在尝试更改以前的脚本,该脚本利用 biopython 获取有关物种门的信息。编写此脚本是为了一次检索一个物种的信息。我想修改脚本,以便我可以一次对 100 个生物执行此操作。
这是初始代码

import sys
from Bio import Entrez

def get_tax_id(species):
    """to get data from ncbi taxomomy, we need to have the taxid.  we can
    get that by passing the species name to esearch, which will return
    the tax id"""
    species = species.replace(" ", "+").strip()
    search = Entrez.esearch(term = species, db = "taxonomy", retmode = "xml")
    record = Entrez.read(search)
    return record['IdList'][0]

def get_tax_data(taxid):
    """once we have the taxid, we can fetch the record"""
    search = Entrez.efetch(id = taxid, db = "taxonomy", retmode = "xml")
    return Entrez.read(search)

Entrez.email = ""
if not Entrez.email:
    print "you must add your email address"
    sys.exit(2)
taxid = get_tax_id("Erodium carvifolium")
data = get_tax_data(taxid)
lineage = {d['Rank']:d['ScientificName'] for d in 
    data[0]['LineageEx'] if d['Rank'] in ['family', 'order']}

我设法修改了脚本,以便它接受包含我正在使用的一种生物的本地文件。但我需要将其扩展到 100 个生物体。
所以我的想法是从我的生物体文件中生成一个列表,并以某种方式分别将列表中生成的每个项目输入到行 taxid = get_tax_id("Erodium carvifolium") 中。并用我的生物名称替换“Erodium carvifolium”。但我不知道该怎么做。

这是代码的示例版本,并进行了一些调整
 import sys
from Bio import Entrez


def get_tax_id(species):
    """to get data from ncbi taxomomy, we need to have the taxid. we can
    get that by passing the species name to esearch, which will return
    the tax id"""
    species = species.replace(' ', "+").strip()
    search = Entrez.esearch(term = species, db = "taxonomy", retmode = "xml")
    record = Entrez.read(search)
    return record['IdList'][0]

def get_tax_data(taxid):
    """once we have the taxid, we can fetch the record"""
    search = Entrez.efetch(id = taxid, db = "taxonomy", retmode = "xml")
    return Entrez.read(search)

Entrez.email = ""
if not Entrez.email:
    print "you must add your email address"
    sys.exit(2)
list = ['Helicobacter pylori 26695', 'Thermotoga maritima MSB8', 'Deinococcus radiodurans R1', 'Treponema pallidum subsp. pallidum str. Nichols', 'Aquifex aeolicus VF5', 'Archaeoglobus fulgidus DSM 4304']
i = iter(list)
item = i.next()
for item in list:
     ???
taxid = get_tax_id(?)
data = get_tax_data(taxid)
lineage = {d['Rank']:d['ScientificName'] for d in
    data[0]['LineageEx'] if d['Rank'] in ['phylum']}
print lineage, taxid

问号指的是我对下一步该做什么感到困惑的地方。我不知道如何连接我的循环来替换 ?在 get_tax_id(?) 中。或者我是否需要以某种方式附加列表中的每个项目,以便每次修改它们以包含 get_tax_id(Helicobacter pylori 26695)然后找到某种方法将它们放在包含taxid =的行中

最佳答案

这是您需要的,将其放在您的函数定义下方,即在以下行之后:sys.exit(2)

species_list = ['Helicobacter pylori 26695', 'Thermotoga maritima MSB8', 'Deinococcus radiodurans R1', 'Treponema pallidum subsp. pallidum str. Nichols', 'Aquifex aeolicus VF5', 'Archaeoglobus fulgidus DSM 4304']

taxid_list = [] # Initiate the lists to store the data to be parsed in
data_list = []
lineage_list = []

print('parsing taxonomic data...') # message declaring the parser has begun

for species in species_list:
    print ('\t'+species) # progress messages

    taxid = get_tax_id(species) # Apply your functions
    data = get_tax_data(taxid)
    lineage = {d['Rank']:d['ScientificName'] for d in data[0]['LineageEx'] if d['Rank'] in ['phylum']}

    taxid_list.append(taxid) # Append the data to lists already initiated
    data_list.append(data)
    lineage_list.append(lineage)

print('complete!')

关于list - 尝试从 Biopython 获取分类信息,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/16504238/

相关文章:

python - 变量赋值和修改(Python中)

list - Flutter FireStore空列表

有人可以帮我理解这个 "for"循环是如何工作的吗?

php - 循环遍历3个不同的数组,并将值发送到MYSQL表中

c++ - 如何在 STL map 内迭代 STL map ?

python - 分多个步骤填写列表

python - 使用 CSV 中的数据迭代 URL 以进行 API 数据拉取 - Python

java - 对象映射列表作为Class的成员变量

python - 如何将多个列表附加到 python 字典中的一个键?

jQuery 一次循环两个元素