python - 从 pubmed 获取摘要

我在从以下查询获取摘要时遇到问题

Entrez.email = "anonymous@gmail.com"
esearch_query = Entrez.esearch(db="pubmed", term="cancer AND food", retmode="xml")
esearch_result = Entrez.read(esearch_query)

# Now we need to get all papers from our search using the IDList
for iden in esearch_result['IdList'][-1]:
    pubmed_entry = Entrez.efetch(db="pubmed", id=iden, retmode="xml")
    result = Entrez.read(pubmed_entry)
    print result

输出如下(仅以其中一个条目为例)。

{u'PubmedArticle': [{u'MedlineCitation': DictElement({u'DateCompleted': {u'Month': '01', u'Day': '10', u'Year': '1976'}, u'OtherID': [], u'DateRevised': {u'Month': '03', u'Day': '22', u'Year': '2017'}, u'MeshHeadingList': [{u'QualifierName': [], u'DescriptorName': StringElement('Binding Sites', attributes={u'UI': u'D001665', u'MajorTopicYN': u'N'})}, {u'QualifierName': [StringElement('blood', attributes={u'UI': u'Q000097', u'MajorTopicYN': u'N'})], u'DescriptorName': StringElement('Cobalt', attributes={u'UI': u'D003035', u'MajorTopicYN': u'N'})}, {u'QualifierName': [], u'DescriptorName': StringElement('Hemoglobins', attributes={u'UI': u'D006454', u'MajorTopicYN': u'Y'})}, {u'QualifierName': [], u'DescriptorName': StringElement('Humans', attributes={u'UI': u'D006801', u'MajorTopicYN': u'N'})}, {u'QualifierName': [], u'DescriptorName': StringElement('Hydrogen-Ion Concentration', attributes={u'UI': u'D006863', u'MajorTopicYN': u'N'})}, {u'QualifierName': [StringElement('blood', attributes={u'UI': u'Q000097', u'MajorTopicYN': u'N'})], u'DescriptorName': StringElement('Iron', attributes={u'UI': u'D007501', u'MajorTopicYN': u'N'})}, {u'QualifierName': [], u'DescriptorName': StringElement('Ligands', attributes={u'UI': u'D008024', u'MajorTopicYN': u'N'})}, {u'QualifierName': [], u'DescriptorName': StringElement('Mathematics', attributes={u'UI': u'D008433', u'MajorTopicYN': u'N'})}, {u'QualifierName': [StringElement('blood', attributes={u'UI': u'Q000097', u'MajorTopicYN': u'Y'})], u'DescriptorName': StringElement('Oxygen', attributes={u'UI': u'D010100', u'MajorTopicYN': u'N'})}, {u'QualifierName': [], u'DescriptorName': StringElement('Oxyhemoglobins', attributes={u'UI': u'D010108', u'MajorTopicYN': u'N'})}, {u'QualifierName': [], u'DescriptorName': StringElement('Protein Binding', attributes={u'UI': u'D011485', u'MajorTopicYN': u'N'})}, {u'QualifierName': [], u'DescriptorName': StringElement('Spectrum Analysis', attributes={u'UI': u'D013057', u'MajorTopicYN': u'N'})}], u'OtherAbstract': [], u'CitationSubset': ['IM'], u'ChemicalList': [{u'NameOfSubstance': StringElement('Hemoglobins', attributes={u'UI': u'D006454'}), u'RegistryNumber': '0'}, {u'NameOfSubstance': StringElement('Ligands', attributes={u'UI': u'D008024'}), u'RegistryNumber': '0'}, {u'NameOfSubstance': StringElement('Oxyhemoglobins', attributes={u'UI': u'D010108'}), u'RegistryNumber': '0'}, {u'NameOfSubstance': StringElement('Cobalt', attributes={u'UI': u'D003035'}), u'RegistryNumber': '3G0H8C9362'}, {u'NameOfSubstance': StringElement('Iron', attributes={u'UI': u'D007501'}), u'RegistryNumber': 'E1UOL152H7'}, {u'NameOfSubstance': StringElement('Oxygen', attributes={u'UI': u'D010100'}), u'RegistryNumber': 'S88TT14065'}], u'KeywordList': [], u'DateCreated': {u'Month': '01', u'Day': '10', u'Year': '1976'}, u'SpaceFlightMission': [], u'GeneralNote': [], u'Article': DictElement({u'ArticleDate': [], u'Pagination': {u'MedlinePgn': '1424-31'}, u'AuthorList': ListElement([DictElement({u'LastName': 'Chow', u'Initials': 'YW', u'Identifier': [], u'AffiliationInfo': [], u'ForeName': 'Y W'}, attributes={u'ValidYN': u'Y'}), DictElement({u'LastName': 'Pietranico', u'Initials': 'R', u'Identifier': [], u'AffiliationInfo': [], u'ForeName': 'R'}, attributes={u'ValidYN': u'Y'}), DictElement({u'LastName': 'Mukerji', u'Initials': 'A', u'Identifier': [], u'AffiliationInfo': [], u'ForeName': 'A'}, attributes={u'ValidYN': u'Y'})], attributes={u'CompleteYN': u'Y'}), u'Language': ['eng'], u'PublicationTypeList': [StringElement('Journal Article', attributes={u'UI': u'D016428'}), StringElement("Research Support, U.S. Gov't, Non-P.H.S.", attributes={u'UI': u'D013486'})], u'Journal': {u'ISSN': StringElement('0006-291X', attributes={u'IssnType': u'Print'}), u'ISOAbbreviation': 'Biochem. Biophys. Res. Commun.', u'JournalIssue': DictElement({u'Volume': '66', u'Issue': '4', u'PubDate': {u'Month': 'Oct', u'Day': '27', u'Year': '1975'}}, attributes={u'CitedMedium': u'Print'}), u'Title': 'Biochemical and biophysical research communications'}, u'ArticleTitle': 'Studies of oxygen binding energy to hemoglobin molecule.', u'ELocationID': []}, attributes={u'PubModel': u'Print'}), u'PMID': StringElement('6', attributes={u'Version': u'1'}), u'MedlineJournalInfo': {u'MedlineTA': 'Biochem Biophys Res Commun', u'Country': 'United States', u'NlmUniqueID': '0372516', u'ISSNLinking': '0006-291X'}}, attributes={u'Status': u'MEDLINE', u'Owner': u'NLM'}), u'PubmedData': {u'ArticleIdList': [StringElement('6', attributes={u'IdType': u'pubmed'}), StringElement('0006-291X(75)90518-5', attributes={u'IdType': u'pii'})], u'PublicationStatus': 'ppublish', u'History': [DictElement({u'Month': '10', u'Day': '27', u'Year': '1975'}, attributes={u'PubStatus': u'pubmed'}), DictElement({u'Minute': '1', u'Month': '10', u'Day': '27', u'Hour': '0', u'Year': '1975'}, attributes={u'PubStatus': u'medline'}), DictElement({u'Minute': '0', u'Month': '10', u'Day': '27', u'Hour': '0', u'Year': '1975'}, attributes={u'PubStatus': u'entrez'})]}}], u'PubmedBookArticle': []}

如何获取摘要？最终的想法是在sql数据库中拥有一些字段(例如title，abstract..)。

谢谢，大卫

最佳答案

可能对您不利的是，通常没有 1975 年之前的 MEDLINE PubMed 记录摘要 - 您的示例正好处于 1975 年的风口浪尖。我使用您的代码和不同的查询，显示了两个文章 ID ，一个有摘要，一个没有:

from Bio import Entrez

Entrez.email = "anonymous@gmail.com"

esearch_query = Entrez.esearch(db="pubmed", term="cancer AND wombats", retmode="xml")
esearch_result = Entrez.read(esearch_query)

for identifier in esearch_result['IdList']:
    pubmed_entry = Entrez.efetch(db="pubmed", id=identifier, retmode="xml")
    result = Entrez.read(pubmed_entry)

    article = result['PubmedArticle'][0]['MedlineCitation']['Article']

    if 'Abstract' in article:
        print(article['Abstract']['AbstractText'])

截断输出

['This report catalogues all spontaneous proliferations in macropods, koalas, wombats, and possums and gliders held by the Comparative Pathology Registry at Taronga Zoo. Proliferative lesions were present in 14 macropods, 26 koalas, two wombats and 22 possums and gliders. Most neoplasms recorded in macropods were singular and ....']

详情可见文档:MEDLINE PubMed XML Element Descriptions

关于python - 从 pubmed 获取摘要，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/45979650/

python - 从 pubmed 获取摘要

上一篇：python - 在 python 中屏蔽然后粘贴两个图像

下一篇：python - 根据另一个数据框中的列名选择数据框中的行