python - 从 pubmed 获取摘要

标签 python biopython

我在从以下查询获取摘要时遇到问题

Entrez.email = "anonymous@gmail.com"
esearch_query = Entrez.esearch(db="pubmed", term="cancer AND food", retmode="xml")
esearch_result = Entrez.read(esearch_query)

# Now we need to get all papers from our search using the IDList
for iden in esearch_result['IdList'][-1]:
    pubmed_entry = Entrez.efetch(db="pubmed", id=iden, retmode="xml")
    result = Entrez.read(pubmed_entry)
    print result

输出如下(仅以其中一个条目为例)。

{u'PubmedArticle': [{u'MedlineCitation': DictElement({u'DateCompleted': {u'Month': '01', u'Day': '10', u'Year': '1976'}, u'OtherID': [], u'DateRevised': {u'Month': '03', u'Day': '22', u'Year': '2017'}, u'MeshHeadingList': [{u'QualifierName': [], u'DescriptorName': StringElement('Binding Sites', attributes={u'UI': u'D001665', u'MajorTopicYN': u'N'})}, {u'QualifierName': [StringElement('blood', attributes={u'UI': u'Q000097', u'MajorTopicYN': u'N'})], u'DescriptorName': StringElement('Cobalt', attributes={u'UI': u'D003035', u'MajorTopicYN': u'N'})}, {u'QualifierName': [], u'DescriptorName': StringElement('Hemoglobins', attributes={u'UI': u'D006454', u'MajorTopicYN': u'Y'})}, {u'QualifierName': [], u'DescriptorName': StringElement('Humans', attributes={u'UI': u'D006801', u'MajorTopicYN': u'N'})}, {u'QualifierName': [], u'DescriptorName': StringElement('Hydrogen-Ion Concentration', attributes={u'UI': u'D006863', u'MajorTopicYN': u'N'})}, {u'QualifierName': [StringElement('blood', attributes={u'UI': u'Q000097', u'MajorTopicYN': u'N'})], u'DescriptorName': StringElement('Iron', attributes={u'UI': u'D007501', u'MajorTopicYN': u'N'})}, {u'QualifierName': [], u'DescriptorName': StringElement('Ligands', attributes={u'UI': u'D008024', u'MajorTopicYN': u'N'})}, {u'QualifierName': [], u'DescriptorName': StringElement('Mathematics', attributes={u'UI': u'D008433', u'MajorTopicYN': u'N'})}, {u'QualifierName': [StringElement('blood', attributes={u'UI': u'Q000097', u'MajorTopicYN': u'Y'})], u'DescriptorName': StringElement('Oxygen', attributes={u'UI': u'D010100', u'MajorTopicYN': u'N'})}, {u'QualifierName': [], u'DescriptorName': StringElement('Oxyhemoglobins', attributes={u'UI': u'D010108', u'MajorTopicYN': u'N'})}, {u'QualifierName': [], u'DescriptorName': StringElement('Protein Binding', attributes={u'UI': u'D011485', u'MajorTopicYN': u'N'})}, {u'QualifierName': [], u'DescriptorName': StringElement('Spectrum Analysis', attributes={u'UI': u'D013057', u'MajorTopicYN': u'N'})}], u'OtherAbstract': [], u'CitationSubset': ['IM'], u'ChemicalList': [{u'NameOfSubstance': StringElement('Hemoglobins', attributes={u'UI': u'D006454'}), u'RegistryNumber': '0'}, {u'NameOfSubstance': StringElement('Ligands', attributes={u'UI': u'D008024'}), u'RegistryNumber': '0'}, {u'NameOfSubstance': StringElement('Oxyhemoglobins', attributes={u'UI': u'D010108'}), u'RegistryNumber': '0'}, {u'NameOfSubstance': StringElement('Cobalt', attributes={u'UI': u'D003035'}), u'RegistryNumber': '3G0H8C9362'}, {u'NameOfSubstance': StringElement('Iron', attributes={u'UI': u'D007501'}), u'RegistryNumber': 'E1UOL152H7'}, {u'NameOfSubstance': StringElement('Oxygen', attributes={u'UI': u'D010100'}), u'RegistryNumber': 'S88TT14065'}], u'KeywordList': [], u'DateCreated': {u'Month': '01', u'Day': '10', u'Year': '1976'}, u'SpaceFlightMission': [], u'GeneralNote': [], u'Article': DictElement({u'ArticleDate': [], u'Pagination': {u'MedlinePgn': '1424-31'}, u'AuthorList': ListElement([DictElement({u'LastName': 'Chow', u'Initials': 'YW', u'Identifier': [], u'AffiliationInfo': [], u'ForeName': 'Y W'}, attributes={u'ValidYN': u'Y'}), DictElement({u'LastName': 'Pietranico', u'Initials': 'R', u'Identifier': [], u'AffiliationInfo': [], u'ForeName': 'R'}, attributes={u'ValidYN': u'Y'}), DictElement({u'LastName': 'Mukerji', u'Initials': 'A', u'Identifier': [], u'AffiliationInfo': [], u'ForeName': 'A'}, attributes={u'ValidYN': u'Y'})], attributes={u'CompleteYN': u'Y'}), u'Language': ['eng'], u'PublicationTypeList': [StringElement('Journal Article', attributes={u'UI': u'D016428'}), StringElement("Research Support, U.S. Gov't, Non-P.H.S.", attributes={u'UI': u'D013486'})], u'Journal': {u'ISSN': StringElement('0006-291X', attributes={u'IssnType': u'Print'}), u'ISOAbbreviation': 'Biochem. Biophys. Res. Commun.', u'JournalIssue': DictElement({u'Volume': '66', u'Issue': '4', u'PubDate': {u'Month': 'Oct', u'Day': '27', u'Year': '1975'}}, attributes={u'CitedMedium': u'Print'}), u'Title': 'Biochemical and biophysical research communications'}, u'ArticleTitle': 'Studies of oxygen binding energy to hemoglobin molecule.', u'ELocationID': []}, attributes={u'PubModel': u'Print'}), u'PMID': StringElement('6', attributes={u'Version': u'1'}), u'MedlineJournalInfo': {u'MedlineTA': 'Biochem Biophys Res Commun', u'Country': 'United States', u'NlmUniqueID': '0372516', u'ISSNLinking': '0006-291X'}}, attributes={u'Status': u'MEDLINE', u'Owner': u'NLM'}), u'PubmedData': {u'ArticleIdList': [StringElement('6', attributes={u'IdType': u'pubmed'}), StringElement('0006-291X(75)90518-5', attributes={u'IdType': u'pii'})], u'PublicationStatus': 'ppublish', u'History': [DictElement({u'Month': '10', u'Day': '27', u'Year': '1975'}, attributes={u'PubStatus': u'pubmed'}), DictElement({u'Minute': '1', u'Month': '10', u'Day': '27', u'Hour': '0', u'Year': '1975'}, attributes={u'PubStatus': u'medline'}), DictElement({u'Minute': '0', u'Month': '10', u'Day': '27', u'Hour': '0', u'Year': '1975'}, attributes={u'PubStatus': u'entrez'})]}}], u'PubmedBookArticle': []}

如何获取摘要?最终的想法是在sql数据库中拥有一些字段(例如title,abstract..)。

谢谢, 大卫

最佳答案

可能对您不利的是,通常没有 1975 年之前的 MEDLINE PubMed 记录摘要 - 您的示例正好处于 1975 年的风口浪尖。我使用您的代码和不同的查询,显示了两个文章 ID ,一个有摘要,一个没有:

from Bio import Entrez

Entrez.email = "anonymous@gmail.com"

esearch_query = Entrez.esearch(db="pubmed", term="cancer AND wombats", retmode="xml")
esearch_result = Entrez.read(esearch_query)

for identifier in esearch_result['IdList']:
    pubmed_entry = Entrez.efetch(db="pubmed", id=identifier, retmode="xml")
    result = Entrez.read(pubmed_entry)

    article = result['PubmedArticle'][0]['MedlineCitation']['Article']

    if 'Abstract' in article:
        print(article['Abstract']['AbstractText'])

截断输出

['This report catalogues all spontaneous proliferations in macropods, koalas, wombats, and possums and gliders held by the Comparative Pathology Registry at Taronga Zoo. Proliferative lesions were present in 14 macropods, 26 koalas, two wombats and 22 possums and gliders. Most neoplasms recorded in macropods were singular and ....']

详情可见文档:MEDLINE PubMed XML Element Descriptions

关于python - 从 pubmed 获取摘要,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/45979650/

相关文章:

python - 为什么此命令 python manage.py shell 出现错误?

python - TypeError : tuple indices must be integers or slices, 命令中不是 str

python - 如何使用 Bio.PDB 分别保存 PDB 文件中的每个配体?

python - 从打乱的 n-gram 列表中生成可能的句子(python)

python - long 的结构长度取决于字节顺序?

python - C++ 到更简单的语言(Python、Lua 等)转换器?

javascript - Python:如何访问网页,单击特定链接并将其中的数据复制到文本文件?

python - 如何根据步长提取短序列?

python - Biopython 中 BioPerl 的 Bio::DB::Fasta 的等效函数是什么?

Python:如何根据位置输出FASTA头或染色体索引图?