python - pandas reshape 元组 - 无法重新索引重复轴

对于 pandas 数据框:

定义为:

import pandas as pd
df = pd.DataFrame({'id':[1,2,3], 're_foo':[1,2,3], 're_bar':[4,5,6], 're_foo_baz':[0.4, 0.8, .9], 're_bar_baz':[.4,.5,.6], 'iteration':[1,2,3]})
display(df)

我想 reshape 为以下格式:

id, metric_kind, foo      , bar      , iteration
1, regular     , 1        , 4        , 1
1, baz         , 0.4    , 0.4        , 1
...

来自pandas reshape multiple columns fails with KeyError我了解到:

df.set_index(['id','iteration']).stack()#.reset_index().rename(columns={'level_2':'metric', 0: 'value'})

将输出不同的元组，但我想将元组的两个值保留在一行中。

dx = df[['id', 'foo', 'bar', 'iteration']].copy()
dx['kind'] = 'regular'
dx = pd.concat([dx, df[['id', 'foo_baz', 'bar_baz', 'iteration']]], axis=0)
dx['kind'] = dx['kind'].fillna('baz')
dx.loc[dx.foo.isnull(), 'foo'] = dx.foo_baz
# now fill other NULL values

会失败并显示:

ValueError: cannot reindex from a duplicate axis
instead.

编辑

我看到一个更聪明的 fillna:

dx.foo = dx.foo.fillna(dx.foo_baz)
dx.bar = dx.bar.fillna(dx.bar_baz)
dx = dx.drop(['foo_baz', 'bar_baz'], axis= 1)

工作就完成了——但这看起来真的很笨拙。有更好的办法吗？

最佳答案

我的方法是提取相关部分和堆栈:

s = df.set_index(['id', 'iteration'])

s.columns = pd.MultiIndex.from_frame(s.columns
                                     .str.extract('([^_]*_[^_]*)_?([^_]*)')
                                     .replace('', 'regular')
                                    )        

s.stack(1).reset_index()

输出:

0  id  iteration        1  re_bar  re_foo
0   1          1      baz     0.4     0.4
1   1          1  regular     4.0     1.0
2   2          2      baz     0.5     0.8
3   2          2  regular     5.0     2.0
4   3          3      baz     0.6     0.9
5   3          3  regular     6.0     3.0

关于python - pandas reshape 元组 - 无法重新索引重复轴，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/60377346/

python - pandas reshape 元组 - 无法重新索引重复轴

编辑

上一篇：Python:绘制直方图，其中 y 值小于 1

下一篇：clojure - Compose + Swagger 具有基本身份验证