我有以下 pandas
数据框
import pandas as pd
import numpy as np
pd.np.random.seed(1)
N = 5
data = pd.DataFrame(pd.np.random.rand(N, 3), columns=['Monday', 'Wednesday', 'Friday'])
data['State'] = 'ST' + pd.Series((pd.np.arange(N) % 19).astype(str))
print data
Monday Wednesday Friday State
0 0.417022 0.720324 0.000114 ST0
1 0.302333 0.146756 0.092339 ST1
2 0.186260 0.345561 0.396767 ST2
3 0.538817 0.419195 0.685220 ST3
4 0.204452 0.878117 0.027388 ST4
我想将此数据框转换为
0 ST0 Monday 0.417022
Wednesday 0.7203245
Friday 0.0001143748
1 ST1 Monday 0.3023326
Wednesday 0.1467559
Friday 0.09233859
2 ST2 Monday 0.1862602
Wednesday 0.3455607
Friday 0.3967675
State ST2
3 ST3 Monday 0.5388167
Wednesday 0.4191945
Friday 0.6852195
State ST3
4 ST4 Monday 0.2044522
Wednesday 0.8781174
Friday 0.02738759
State ST4
如果单独使用data.stack()
,它会给出类似的内容,
0 Monday 0.417022
Wednesday 0.7203245
Friday 0.0001143748
State ST0
1 Monday 0.3023326
Wednesday 0.1467559
Friday 0.09233859
State ST1
2 Monday 0.1862602
Wednesday 0.3455607
Friday 0.3967675
State ST2
3 Monday 0.5388167
Wednesday 0.4191945
Friday 0.6852195
State ST3
4 Monday 0.2044522
Wednesday 0.8781174
Friday 0.02738759
State ST4
在这里,我如何在多重索引中选择State
列作为第一级,并选择第二级的其他列。
最佳答案
您只需在堆叠之前将 State 列移至索引即可:
data.set_index('State', append=True).stack()
Out[4]:
State
0 ST0 Monday 0.417022
Wednesday 0.720324
Friday 0.000114
1 ST1 Monday 0.302333
Wednesday 0.146756
Friday 0.092339
2 ST2 Monday 0.186260
Wednesday 0.345561
Friday 0.396767
3 ST3 Monday 0.538817
Wednesday 0.419195
Friday 0.685220
4 ST4 Monday 0.204452
Wednesday 0.878117
Friday 0.027388
dtype: float64
请注意,这与您发布的输出并不完全匹配,我没有将状态与日期一起包含在内,因为我认为这种方式更明智,如果您真的希望它像原始输出一样,那就是:data.set_index('State',append=True,drop=False).stack()
关于python - 使用堆栈函数转换 pandas 数据框,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/30884447/