我试图在这里取消嵌套国会数据:https://theunitedstates.io/congress-legislators/legislators-historical.json
示例结构:
{
"id": {
"bioguide": "B000226",
"govtrack": 401222,
"icpsr": 507,
"wikipedia": "Richard Bassett (politician)",
"wikidata": "Q518823",
"google_entity_id": "kg:/m/02pz46"
},
"name": {
"first": "Richard",
"last": "Bassett"
},
"bio": {
"birthday": "1745-04-02",
"gender": "M"
},
"terms": [
{
"type": "sen",
"start": "1789-03-04",
"end": "1793-03-03",
"state": "DE",
"class": 2,
"party": "Anti-Administration"
}
]
}
如果我只使用 json_normalize(data)
,“术语”不会解除嵌套。
如果我尝试专门取消嵌套术语,例如 json_normalize(data, 'terms', 'name')
,那么我包含的其他内容(此处为名称)将保持为 dict 格式 {u'last': u'Bassett', u'first': u'Richard'}
作为行条目。
完整的当前代码,如果您想运行它:
import json
import urllib
import pandas as pd
from pandas.io.json import json_normalize
# load data
url = "https://theunitedstates.io/congress-legislators/legislators-historical.json"
json_url = urllib.urlopen(url)
data = json.loads(json_url.read())
# parse
congress_names = json_normalize(data, record_path='terms',meta='name')
最佳答案
我认为下面的代码应该可以工作。可能有更好的标准化方法,但我不知道。
import requests
import pandas as pd
import re
import json
from pandas.io.json import json_normalize
url = ' https://theunitedstates.io/congress-legislators/legislators-historical.json'
resp = requests.get(url)
raw_dict = json.loads(resp.text)
df = pd.DataFrame()
for i in range(len(raw_dict)):
df1 = json_normalize(raw_dict[i], record_path = ['terms'], meta = ['name'])
df1 = pd.concat([df1, df1['name'].apply(pd.Series)], axis=1)
df = pd.concat([df,df1], axis=0, ignore_index =True, sort=True)
关于python - 在 Python 中取消嵌套/规范化 JSON,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/59673313/