我正在开发一个项目,该项目要求我使用 Excel
电子表格中的输入来搜索 pubmed
并打印结果计数。我一直在使用 xlrd 和 entrez 来完成这项工作。这是我尝试过的。
我需要使用作者姓名、他/她的医学院、年份范围以及他/她导师的姓名来搜索
pubmed
,这些信息均位于Excel
电子表格。我使用xlrd
将包含所需信息的每一列转换为字符串列表。from xlrd import open_workbook book = xlrd.open_workbook("HEENT.xlsx").sheet_by_index(0) med_name = [] for row in sheet.col(2): med_name.append(row) med_school = [] for row in sheet.col(3): med_school.append(row) mentor = [] for row in sheet.col(9): mentor.append(row)
我已成功使用 Entrez 打印特定查询的计数。
from Bio import Entrez Entrez.email = "<a href="https://stackoverflow.com/cdn-cgi/l/email-protection" class="__cf_email__" data-cfemail="5821372d2a183d35393134763d3c2d" rel="noreferrer noopener nofollow">[email protected]</a>" handle = Entrez.egquery(term="Jennifer Runch AND ((2012[Date - Publication] : 2017[Date - Publication])) ") handle_1 = Entrez.egquery(term = "Jennifer Runch AND ((2012[Date - Publication] : 2017[Date - Publication])) AND Leoard P. Byk") handle_2 = Entrez.egquery(term = "Jennifer Runch AND ((2012[Date - Publication] : 2017[Date - Publication])) AND Southern Illinois University School of Medicine") record = Entrez.read(handle) record_1 = Entrez.read(handle_1) record_2 = Entrez.read(handle_2) pubmed_count = [] for row in record["eGQueryResult"]: if row["DbName"] == "pubmed": pubmed_count.append(row["Count"]) for row in record_1["eGQueryResult"]: if row["DbName"] == "pubmed": pubmed_count.append(row["Count"]) for row in record_2["eGQueryResult"]: if row["DbName"] == "pubmed": pubmed_count.append(row["Count"]) print(pubmed_count) >>>['3', '0', '0']
问题是我需要将学生姓名(“Jennifer Runch”)替换为学生姓名列表中的下一个学生姓名(“med_name”)、下一个学校的医学院以及当前导师的姓名以及列表中下一位导师的名字。
我认为我应该在向 pubmed
声明我的电子邮件后编写一个 for 循环,但我不确定如何将两个代码块链接在一起。有谁知道连接两个代码块的有效方法,或者知道如何用比我尝试过的方法更有效的方法来做到这一点?
谢谢!
最佳答案
您已完成大部分代码。只需要稍微修改一下即可。
假设您的表格如下所示:
Jennifer Bunch |Southern Illinois University School of Medicine|Leonard P. Rybak
Philipp Robinson|Stanford University School of Medicine |Roger Kornberg
您可以使用以下代码
import xlrd
from Bio import Entrez
sheet = xlrd.open_workbook("HEENT.xlsx").sheet_by_index(0)
med_name = list()
med_school = list()
mentor = list()
search_terms = list()
for row in range(0, sheet.nrows):
search_terms.append([sheet.cell_value(row, 0), sheet.cell_value(row,1), sheet.cell_value(row, 2)])
pubmed_counts = list()
for search_term in search_terms:
handle = Entrez.egquery(term="{0} AND ((2012[Date - Publication] : 2017[Date - Publication])) ".format(search_term[0]))
handle_1 = Entrez.egquery(term = "{0} AND ((2012[Date - Publication] : 2017[Date - Publication])) AND {1}".format(search_term[0], search_term[2]))
handle_2 = Entrez.egquery(term = "{0} AND ((2012[Date - Publication] : 2017[Date - Publication])) AND {1}".format(search_term[0], search_term[1]))
record = Entrez.read(handle)
record_1 = Entrez.read(handle_1)
record_2 = Entrez.read(handle_2)
pubmed_count = ['', '', '']
for row in record["eGQueryResult"]:
if row["DbName"] == "pubmed":
pubmed_count[0] = row["Count"]
for row in record_1["eGQueryResult"]:
if row["DbName"] == "pubmed":
pubmed_count[1] = row["Count"]
for row in record_2["eGQueryResult"]:
if row["DbName"] == "pubmed":
pubmed_count[2] = row["Count"]
print(pubmed_count)
pubmed_counts.append(pubmed_count)
输出
['3', '0', '0']
['1', '0', '0']
所需的修改是使用 format 使查询变量.
其他一些不必要但可能有帮助的修改:
- 仅循环
Excel
工作表一次 - 将
pubmed_count
存储在预定义列表中,因为如果值返回为空,则输出的大小会有所不同,从而很难猜测哪个值属于哪个查询 - 一切都可以进一步优化和美化,例如将查询存储在列表中并循环它们,这会减少代码重复,但现在它完成了工作。
关于python - XLRD/Entrez : Search through Pubmed and extract the counts,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/39968425/