我有一个脚本,它将嵌套的 JSON 读取为 pandas 数据帧,并向其中添加一个新列,并将其另存为 JSON。
import numpy as np
from pandas.io.json import json_normalize
sample_json = {
"name": {
"emails": [{"address": "clark.kent@example.com"}],
"countries": [{"country": "US"}, {"country": "UK"}],
}
}
df = json_normalize(sample_json)
df["name.hobbies"] = np.nan
print(df)
df.to_json("sample.json", orient="records", lines=True)
我的输出看起来像,
{
"name.countries": [
{
"country": "US"
},
{
"country": "UK"
}
],
"name.emails": [
{
"address": "clark.kent@example.com"
}
],
"name.hobbies": null
}
我想将数据帧保存为嵌套 JSON,如下所示,
"name": {
"emails": [{"address": "clark.kent@example.com"}],
"countries": [{"country": "US"}, {"country": "UK"}],
"hobbies": null
}
有没有办法将派生的 pandas 数据帧保存为嵌套 JSON?
最佳答案
在我看来,嵌套 json 最简单的方法是创建字典、添加新值并最后转换为 json:
sample_json['name']['hobies'] = None
j = json.dumps(sample_json)
print (j)
{"name": {"emails": [{"address": "clark.kent@example.com"}],
"countries": [{"country": "US"}, {"country": "UK"}],
"hobies": null}}
Pandas 解决方案 - 通过拆分列名称创建 MultiIndex
并创建嵌套字典:
df.columns = df.columns.str.split('.', expand=True)
d = {level: df.xs(level, axis=1).squeeze().to_dict() for level in df.columns.levels[0]}
print (d)
{'name': {'countries': [{'country': 'US'}, {'country': 'UK'}],
'emails': [{'address': 'clark.kent@example.com'}],
'hobbies': nan}}
并将 NaN
转换为 null
检查 Python NaN JSON encoder ,最简单的是设置 None
而不是 NaN
或用 None
替换缺失值:
df = df.where(df.notna(), None)
df.columns = df.columns.str.split('.', expand=True)
d = {level: df.xs(level, axis=1).squeeze().to_dict() for level in df.columns.levels[0]}
j = json.dumps(d)
print (j)
{"name": {"countries": [{"country": "US"}, {"country": "UK"}],
"emails": [{"address": "clark.kent@example.com"}],
"hobbies": null}}
关于python - pandas 数据框作为嵌套 json,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/56472566/