我有 df:
pd.DataFrame({'period': {0: pd.Timestamp('2016-05-01 00:00:00'),
1: pd.Timestamp('2017-05-01 00:00:00'),
2: pd.Timestamp('2018-03-01 00:00:00'),
3: pd.Timestamp('2018-04-01 00:00:00'),
4: pd.Timestamp('2016-05-01 00:00:00'),
5: pd.Timestamp('2017-05-01 00:00:00'),
6: pd.Timestamp('2016-03-01 00:00:00'),
7: pd.Timestamp('2016-04-01 00:00:00')},
'cost2': {0: 15,
1: 144,
2: 44,
3: 34,
4: 13,
5: 11,
6: 12,
7: 13},
'rev2': {0: 154,
1: 13,
2: 33,
3: 37,
4: 15,
5: 11,
6: 12,
7: 13},
'cost1': {0: 19,
1: 39,
2: 53,
3: 16,
4: 19,
5: 11,
6: 12,
7: 13},
'rev1': {0: 34,
1: 34,
2: 74,
3: 22,
4: 34,
5: 11,
6: 12,
7: 13},
'destination': {0: 'YYZ',
1: 'YYZ',
2: 'YYZ',
3: 'YYZ',
4: 'DFW',
5: 'DFW',
6: 'DFW',
7: 'DFW'},
'source': {0: 'SFO',
1: 'SFO',
2: 'SFO',
3: 'SFO',
4: 'MIA',
5: 'MIA',
6: 'MIA',
7: 'MIA'}})
df = df[['source','destination','period','rev1','rev2','cost1','cost2']]
看起来像:
我希望最终的 df 具有以下列:
2017-05-01 2016-05-01
source, destination, rev1, rev2, cost1, cost2, rev1, rev2, cost1, cost2...
因此,本质上,对于每个源/目的地对,我希望在一行中显示每个日期的收入和成本数字。
我一直在修改 stack 和 unstack,但未能实现我的目标。
最佳答案
您可以使用 set_index
+ unstack
将 long 更改为 width,然后使用 swaplevel
更改您需要的列索引的格式
df.set_index(['destination','source','period']).unstack().swaplevel(0,1,axis=1).sort_index(level=0,axis=1)
关于python - pandas stack/unstack 使用交换级别 reshape df,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/49833843/