python - Pandas 数据框 : convert unique row label into non-unique rows

标签 python pandas dataframe

我有这个嵌套的 python 字典,

poke_dic = {
      'Basic': {'Fire': ['Cyndaquil', 'Charmander', 'Torchic'],
                'Grass': ['Chikorita', 'Bulbasaur', 'Treecko'],
                'Water': ['Totodile', 'Squirtle', 'Mudkip']},
     'Evo1': {'Fire': ['Quilava', 'Chameleon', 'Combusken'],
              'Grass': ['Bayleef', 'Ivysaur', 'Grovyle'],
              'Water': ['Croconaw', 'Wartortle', 'Marshtomp']},
     'Evo2': {'Fire': ['Typhlosion', 'Charizard', 'Blaziken'],
              'Grass': ['Meganium', 'Venusaur', 'Sceptile'],
              'Water': ['Feraligatr', 'Blastoise', 'Swampert']}
}

当我将其转换为数据帧格式时,它会生成此表

poke_df = pandas.DataFrame(poke_dic)
poke_df

enter image description here

正如我们所见,列标签是一级字典的键,行标签是二级字典的键。条目的数据类型是数组。我希望分解数组并从行标签创建一个新行。

预期的输出(前几行)应该是这样的

enter image description here

pandas 中是否有任何命令允许我执行此操作?或者我必须首先操纵字典?提前致谢

最佳答案

选项 1
您可以从正确的数据帧开始,而不是创建数据帧和操作数据帧。这里的问题是我必须在理解中使用 enumerate 来确保索引的唯一性。如果愿意,您可以重置该级别。

pd.DataFrame({
    evolution: {
        (element, i): poke
        for element, pokes in types.items()
        for i, poke in enumerate(pokes)
    } 
    for evolution, types in poke_dic.items()
})

              Basic       Evo1        Evo2
Fire  0   Cyndaquil    Quilava  Typhlosion
      1  Charmander  Chameleon   Charizard
      2     Torchic  Combusken    Blaziken
Grass 0   Chikorita    Bayleef    Meganium
      1   Bulbasaur    Ivysaur    Venusaur
      2     Treecko    Grovyle    Sceptile
Water 0    Totodile   Croconaw  Feraligatr
      1    Squirtle  Wartortle   Blastoise
      2      Mudkip  Marshtomp    Swampert

选项 2
更多的理解,但使用 pd.concat

pd.concat({
    ev: pd.Series(*zip(*(
        (p, e) for e, t in x.items() for p in t
    ))) for ev, x in poke_dic.items()
}, axis=1)

            Basic       Evo1        Evo2
Fire    Cyndaquil    Quilava  Typhlosion
Fire   Charmander  Chameleon   Charizard
Fire      Torchic  Combusken    Blaziken
Grass   Chikorita    Bayleef    Meganium
Grass   Bulbasaur    Ivysaur    Venusaur
Grass     Treecko    Grovyle    Sceptile
Water    Totodile   Croconaw  Feraligatr
Water    Squirtle  Wartortle   Blastoise
Water      Mudkip  Marshtomp    Swampert

完全按照OP的要求

pd.concat({
    ev: pd.Series(*zip(*(
        (p, e) for e, t in x.items() for p in t
    ))) for ev, x in poke_dic.items()
}, axis=1).rename_axis('Type').reset_index()

    Type       Basic       Evo1        Evo2
0   Fire   Cyndaquil    Quilava  Typhlosion
1   Fire  Charmander  Chameleon   Charizard
2   Fire     Torchic  Combusken    Blaziken
3  Grass   Chikorita    Bayleef    Meganium
4  Grass   Bulbasaur    Ivysaur    Venusaur
5  Grass     Treecko    Grovyle    Sceptile
6  Water    Totodile   Croconaw  Feraligatr
7  Water    Squirtle  Wartortle   Blastoise
8  Water      Mudkip  Marshtomp    Swampert

选项 W/E
这些选项都不干净,所以我会坚持下去,直到感觉良好为止。

pd.concat({k: pd.DataFrame(v) for k, v in poke_dic.items()}).T.stack() \
  .reset_index(1, drop=True).rename_axis('Type').reset_index()

    Type       Basic       Evo1        Evo2
0   Fire   Cyndaquil    Quilava  Typhlosion
1   Fire  Charmander  Chameleon   Charizard
2   Fire     Torchic  Combusken    Blaziken
3  Grass   Chikorita    Bayleef    Meganium
4  Grass   Bulbasaur    Ivysaur    Venusaur
5  Grass     Treecko    Grovyle    Sceptile
6  Water    Totodile   Croconaw  Feraligatr
7  Water    Squirtle  Wartortle   Blastoise
8  Water      Mudkip  Marshtomp    Swampert

类似@Wen的选项

pd.DataFrame(
    np.column_stack([
        poke_df.index.repeat(3),
        np.array(poke_df.values.tolist()).transpose(0, 2, 1).reshape(-1, 3),
    ]),
    columns=['Type'] + poke_df.columns.tolist()
)

    Type       Basic       Evo1        Evo2
0   Fire   Cyndaquil    Quilava  Typhlosion
1   Fire  Charmander  Chameleon   Charizard
2   Fire     Torchic  Combusken    Blaziken
3  Grass   Chikorita    Bayleef    Meganium
4  Grass   Bulbasaur    Ivysaur    Venusaur
5  Grass     Treecko    Grovyle    Sceptile
6  Water    Totodile   Croconaw  Feraligatr
7  Water    Squirtle  Wartortle   Blastoise
8  Water      Mudkip  Marshtomp    Swampert

关于python - Pandas 数据框 : convert unique row label into non-unique rows,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/49000514/

相关文章:

python - 获取满足条件的 Pandas DataFrame 行的整数索引?

r - 优化缓慢的 for 循环操作

python - 如何将日期时间值更改为单独格式化的值

python - Pandas 按多列分组并按行获取输出

python - 将 np.array 索引传递给函数

python - 如何使用 pymodbus 读取寄存器

Python 将大型 numpy 数组转换为 pandas 数据框

python - 解密算法问题

python - 在大尺寸数据框上填充多列默认值的有效方法

python - 将数据框转换为元组列表的字典