python - 从字典生成 pandas 数据框,其值是不同数量的字典列表

标签 python pandas dataframe

我需要将 JSON 对象解析为 DataFrame。对象的格式是这样的:

 {"219": [{"year": "2015", "code": "VU", "category": "Vulnerable"}, 
          {"year": "2008", "code": "VU", "category": "Vulnerable"}, 
          {"year": "2002", "code": "VU", "category": "Vulnerable"}, 
          {"year": "1996", "code": "VU", "category": "Vulnerable"}, 
          {"year": "1994", "code": "V", "category": "Vulnerable"}, 
          {"year": "1990", "code": "V", "category": "Vulnerable"}, 
          {"year": "1988", "code": "V", "category": "Vulnerable"}, 
          {"year": "1986", "code": "V", "category": "Vulnerable"}], 
  "561": [{"year": "2016", "code": "LC", "category": "Least Concern"}, 
          {"year": "2010", "code": "LC", "category": "Least Concern"}, 
          {"year": "2006", "code": "LC", "category": "Least Concern"}, 
          {"year": "1996", "code": "EN", "category": "Endangered"}, 
          {"year": "1994", "code": "R", "category": "Rare"}, 
          {"year": "1990", "code": "R", "category": "Rare"}, 
          {"year": "1988", "code": "R", "category": "Rare"}, 
          {"year": "1986", "code": "R", "category": "Rare"}], 
  "571": [{"year": "2016", "code": "LC", "category": "Least Concern"}, 
          {"year": "2008", "code": "LC", "category": "Least Concern"}, 
          {"year": "2004", "code": "LC", "category": "Least Concern"}, 
          {"year": "1996", "code": "LR/lc", "category": "Lower Risk/least concern"}]
          }

最终,我希望 DataFrame 使用键作为行,使用 year 作为列(每年一列),使用 code 作为值。我不需要类别。此外,每个 k-v 对在值列表中可以有可变数量的字典(但始终具有相同的 year/code/category 结构) .

taxonid  1986 1988 1990  1994 1996 2002 2004 2006 2008 2010 2015 2016
219         V    V    V    V    VU   VU  NaN  NaN   VU  NaN   VU  NaN
561         R    R    R    R    EN  NaN  NaN   LC  NaN   LC  NaN   LC
571       NaN  NaN  NaN  NaN LR/lc  NaN   LC  NaN   LC  NaN  NaN   LC

有没有办法生成 DataFrame,这样我就不必首先将所有年份声明为列?此处并未代表所有年份,如果有代码能够在每次收到 JSON 对象时创建更新的 df,那就太好了。

我已经浏览了许多 SO 问题,但到目前为止没有任何东西可以帮助解决这个问题。

最佳答案

如果 d 是问题中的字典,则此示例:

df = pd.DataFrame( ((k, *dd.values()) for k, v in d.items() for dd in v), columns=['taxid', 'year', 'code', 'category'] )
df = pd.pivot_table(df, values='code', index='taxid', columns='year', aggfunc='first')
print(df)

打印:

year  1986 1988 1990 1994   1996 2002 2004 2006 2008 2010 2015 2016
taxid                                                              
219      V    V    V    V     VU   VU  NaN  NaN   VU  NaN   VU  NaN
561      R    R    R    R     EN  NaN  NaN   LC  NaN   LC  NaN   LC
571    NaN  NaN  NaN  NaN  LR/lc  NaN   LC  NaN   LC  NaN  NaN   LC

关于python - 从字典生成 pandas 数据框,其值是不同数量的字典列表,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/59523432/

相关文章:

python - 将函数应用于 Pandas DataFrame 的列,以数据类型为条件

python - 将数据框从特定行插入到现有 csv

python - 如何从 GroupBy.apply() 中删除多索引?

python - 如何在Python中使用tabula提取PDF文件中存在的多个表格?

python - Python 2.7 的 MSI 安装的默认安装目录是什么?

python - Pandas 将 JSON 读取到 Excel 中

python - 以最快的方式从数据框中删除值

python - pandas 根据另一列选定值创建新列

Python:随机系统时间种子

python - 如何在 python 中从 3D 图像中提取 paches?