python - Query PubMed with Python - 如何从查询中获取所有文章详细信息到 Pandas DataFrame 并以 CSV 格式导出

标签 python pandas dictionary pubmed

如何从 PubMed 上的查询中获取所有文章详细信息到 Pandas DataFrame 并将它们全部导出为 CSV。

我需要以下文章详细信息:

pubmed_id、标题、关键字、期刊、摘要、结论、方法、结果、版权、doi、publication_date、作者

最佳答案

这就是我的做法。它是功能齐全的代码,您只需要 做的是安装pymed pip install pymed 。 功能在这里:

from pymed import PubMed
pubmed = PubMed(tool="PubMedSearcher", email="myemail@ccc.com")

## PUT YOUR SEARCH TERM HERE ##
search_term = "Your search term"
results = pubmed.query(search_term, max_results=500)
articleList = []
articleInfo = []

for article in results:
# Print the type of object we've found (can be either PubMedBookArticle or PubMedArticle).
# We need to convert it to dictionary with available function
    articleDict = article.toDict()
    articleList.append(articleDict)

# Generate list of dict records which will hold all article details that could be fetch from PUBMED API
for article in articleList:
#Sometimes article['pubmed_id'] contains list separated with comma - take first pubmedId in that list - thats article pubmedId
    pubmedId = article['pubmed_id'].partition('\n')[0]
    # Append article info to dictionary 
    articleInfo.append({u'pubmed_id':pubmedId,
                       u'title':article['title'],
                       u'keywords':article['keywords'],
                       u'journal':article['journal'],
                       u'abstract':article['abstract'],
                       u'conclusions':article['conclusions'],
                       u'methods':article['methods'],
                       u'results': article['results'],
                       u'copyrights':article['copyrights'],
                       u'doi':article['doi'],
                       u'publication_date':article['publication_date'], 
                       u'authors':article['authors']})

# Generate Pandas DataFrame from list of dictionaries
articlesPD = pd.DataFrame.from_dict(articleInfo)
export_csv = df.to_csv (r'C:\Users\YourUsernam\Desktop\export_dataframe.csv', index = None, header=True) 

#Print first 10 rows of dataframe
print(articlesPD.head(10))

关于python - Query PubMed with Python - 如何从查询中获取所有文章详细信息到 Pandas DataFrame 并以 CSV 格式导出,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/57053378/

相关文章:

python - 仅从文件导入特定函数

python - 使用最小堆数据结构实现单遍算法以查找列到列包含

dictionary - 在 Swift 的 NSUserDefaults 中存储 Dictionary<String, CustomObject>

python - 类似于 pandas 数据框的字符串的字典列表

python - 如何获取python列表的第n个元素或默认值(如果不可用)

Python - 相对导入

python - Scrapy:如何设置 HTTP 代理以连接到 HTTPS 网站(HTTP 有效)?

python - 如何在 python pandas 中按自定义顺序排序?

python - 在数据框中成对排列条目序列

python - 从包含列表的字典创建多个字典