python - 使用选定值作为索引的 Pandas 数据透视表

我读了 this出色的旋转指南，但我不知道如何将其应用于我的案例。我有这样的整洁数据:

>>> import pandas as pd
>>> df = pd.DataFrame({
...    'case': ['a', 'a', 'a', 'a', 'b', 'b', 'b', 'b', ],
...    'perf_var': ['num', 'time', 'num', 'time', 'num', 'time', 'num', 'time'],
...    'perf_value': [1, 10, 2, 20, 1, 30, 2, 40] 
...     }
...     )
>>>
>>> df
  case perf_var  perf_value
0    a      num           1
1    a     time          10
2    a      num           2
3    a     time          20
4    b      num           1
5    b     time          30
6    b      num           2
7    b     time          40

我想要的是:

使用“case”作为列

使用“num”值作为索引

使用“时间”值作为值。

给予:

case a   b
1.0  10  30
2.0  20  40

我能看到的所有数据透视示例都在单独的列中包含索引和值，但对我来说，上面的示例似乎是一个有效/常见的“整洁”数据案例(我认为？)。是否有可能从这里转向？

最佳答案

您需要进行一些预处理才能获得最终结果:

   (df.assign(num=np.where(df.perf_var == "num",
                           df.perf_value, 
                           np.nan),
             time=np.where(df.perf_var == "time", 
                           df.perf_value, 
                           np.nan))
      .assign(num=lambda x: x.num.ffill(),
              time=lambda x: x.time.bfill())
      .loc[:, ["case", "num", "time"]]
      .drop_duplicates()
      .pivot("num", "case", "time"))


case       a    b
num     
1.0     10.0    30.0
2.0     20.0    40.0

到同一终点的替代路线:

(
    df.set_index(["case", "perf_var"], append=True)
    .unstack()
    .droplevel(0, 1)
    .assign(num=lambda x: x.num.ffill(), 
            time=lambda x: x.time.bfill())
    .drop_duplicates()
    .droplevel(0)
    .set_index("num", append=True)
    .unstack(0)
    .rename_axis(index=None)
)

关于python - 使用选定值作为索引的 Pandas 数据透视表，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/63393806/

python - 使用选定值作为索引的 Pandas 数据透视表

上一篇：r - 如何过滤数据帧以获取连续增加的值

下一篇：Maven 配置文件 - 如何为父级和模块运行一次插件？