python - 使用 pandas 从世界银行 API 获取数据

标签 python pandas dataframe api

我正在尝试获取一个数据表,仅从世界银行 API 中获取国家/地区、年份和值,但我似乎无法仅筛选出我想要的数据。我发现已经有人提出过此类问题,但所有答案似乎都不起作用。

非常感谢一些帮助。谢谢!

import requests
import pandas as pd
from bs4 import BeautifulSoup
import json
url ="http://api.worldbank.org/v2/country/{}/indicator/NY.GDP.PCAP.CD?date=2015&format=json"
country = ["DZA","AGO","ARG","AUS","AUT","BEL","BRA","CAN","CHL","CHN","COL","CYP", "CZE","DNK","FIN","FRA","GEO","DEU",
          "GRC""HUN","ISL","IND","IDN","IRL","ISR","ITA","JPN","KAZ","KWT","LBN","LIE","MYS","MEX","MCO","MAR","NPL","NLD",
          "NZL","NGA","NOR","OMN","PER","PHL","POL","PRT","QAT","ROU","SGP","ZAF","ESP","SWE","CHE","TZA","THA","TUR","UKR",
          "GBR","USA","VNM","ZWE"]

html={}
for i in country:
 url_one = url.format(i)
 html[i] = requests.get(url_one).json()
my_values=[]
for i in country:

  value=html[i][1][0]['value']
  my_values.append(value)

编辑

我的数据目前如下所示,我正在尝试提取 '{'country': {'id': 'AO', 'value': 'Angola''} 中的国家/地区名称,以及 'date ”和“值(value)” data

编辑2 得到了我正在寻找的数据,但每个数据重复两次 repeated data

最佳答案

注意: 假设一次存储所有年份的信息而不仅仅是一年的信息会很好 - 使您能够在以后的处理中简单地进行过滤。看一下,你们国家之间少了一个“,”"GRC""HUN"

实现您的目标有不同的选择,只需将其中两个指向正确的方向即可。

选项#1

从 json 响应中选择所需的信息,创建一个 reshape 的字典并 append()my_values :

for d in data[1]:

    my_values.append({
        'country':d['country']['value'],
        'date':d['date'],
        'value':d['value']
    })

示例

import requests
import pandas as pd


url = 'http://api.worldbank.org/v2/country/%s/indicator/NY.GDP.PCAP.CD?format=json'
countries = ["DZA","AGO","ARG","AUS","AUT","BEL","BRA","CAN","CHL","CHN","COL","CYP", "CZE","DNK","FIN","FRA","GEO","DEU",
          "GRC","HUN","ISL","IND","IDN","IRL","ISR","ITA","JPN","KAZ","KWT","LBN","LIE","MYS","MEX","MCO","MAR","NPL","NLD",
          "NZL","NGA","NOR","OMN","PER","PHL","POL","PRT","QAT","ROU","SGP","ZAF","ESP","SWE","CHE","TZA","THA","TUR","UKR",
          "GBR","USA","VNM","ZWE"]
    
my_values = []
for country in countries:
    data = requests.get(url %country).json()

    try:
        for d in data[1]:
            my_values.append({
                'country':d['country']['value'],
                'date':d['date'],
                'value':d['value']
            })
    except Exception as err:
        print(f'[ERROR] country ==> {country} error ==> {err}')

pd.DataFrame(my_values).sort_values(['country', 'date'], ascending=True)

选项#2

直接从 json 响应创建数据帧,将它们连接起来并对最终数据帧进行一些调整:

for d in data[1]:
    my_values.append(pd.DataFrame(d))

...

pd.concat(my_values).loc[['value']][['country','date','value']].sort_values(['country', 'date'], ascending=True)

输出

<表类=“s-表”> <标题> 国家 日期 值 <正文> Algolia 1971 341.389 Algolia 1972 442.678 Algolia 1973 554.293 Algolia 1974 818.008 Algolia 1975 936.79 ... ... ... 津巴布韦 2016 1464.59 津巴布韦 2017 1235.19 津巴布韦 2018 1254.64 津巴布韦 2019 1316.74 津巴布韦 2020 1214.51

关于python - 使用 pandas 从世界银行 API 获取数据,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/70575998/

相关文章:

python - 两个日期之间的日期和时间增量

r - R中绘图中同一组的循环点

python - 使用 Django 项目和根目录子文件夹中的静态文件

python - PyCharm IDE 注释多行的快捷键是什么?

python - 在 Pandas 数据框中使用什么 dtype 表示金钱?

python - 如何根据第一次出现的唯一列值获取行

python - 使用映射将 DataFrame 从数字转换为字符串

python - preprocessing.MinMaxScaler 和 preprocessing.normalize 返回 Null 数据帧

python - Pandas 系列 bar_label() 值减少 1

python - 如何获取 Tensorflow seq2seq 嵌入输出