我有下表:
In [303]: table.head()
Out[303]:
people weekday weekofyear
2012-01-01 119 6 52
2012-01-02 76 0 1
2012-01-03 95 1 1
2012-01-04 102 2 1
2012-01-05 87 3 1
我想创建一个简单的pd.DataFrame
,其中:
- 列 = [1, 2, ..., 52] (weekofyear)
- 行 = [0, 1, ..., 6](工作日)
- 值 =
np.sum
我尝试使用 pd.pivot_table
这给了我预期的结果:
In [308]: p = pd.pivot_table(table, index=["weekday"], columns=["weekofyear"], values=["people"], aggfunc=[np.sum])
...: p
...:
Out[308]:
sum ... \
people ...
weekofyear 1 2 3 4 5 6 7 8 9 10 ... 43 44
weekday ...
0 162 86 84 95 92 98 108 102 97 87 ... 108 86
1 95 113 88 78 108 112 98 104 87 105 ... 85 82
2 102 70 93 82 103 80 103 85 82 96 ... 87 105
3 87 91 101 83 91 100 100 80 89 86 ... 87 91
4 111 91 110 103 93 116 110 99 78 77 ... 83 102
5 117 107 99 88 97 90 100 91 97 88 ... 103 110
6 92 95 90 86 91 103 98 100 89 96 ... 94 101
weekofyear 45 46 47 48 49 50 51 52
weekday
0 99 92 99 83 107 106 93 107
1 105 83 101 93 102 89 113 84
2 96 84 110 83 104 84 84 116
3 87 96 87 88 88 83 113 93
4 93 81 104 108 72 101 109 97
5 81 107 97 89 86 108 113 101
6 93 92 93 91 89 96 93 226
[7 rows x 52 columns]
但是我没有拥有我的weekofyears列,而是陷入了无法摆脱的多重索引。如下图:
In [309]: p.columns
Out[309]:
MultiIndex(levels=[['sum'], ['people'], [1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52]],
labels=[[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0], [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0], [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51]],
names=[None, None, 'weekofyear']
虽然索引看起来不错:
In [311]: p.index
Out[311]: Int64Index([0, 1, 2, 3, 4, 5, 6], dtype='int64', name='weekday'
我尝试使用 unstack()
和 reset_index()
函数,但没有成功。
我错过了什么吗?
最佳答案
您应该尝试为它们提供单个值,而不是为 values
和 aggfunc
提供列表。示例 -
p = pd.pivot_table(table, index=["weekday"], columns=["weekofyear"], values="people", aggfunc=np.sum)
演示 -
In [3]: table
Out[3]:
people weekday weekofyear
2012-01-01 119 6 52
2012-01-02 76 0 1
2012-01-03 95 1 1
2012-01-04 102 2 1
2012-01-05 87 3 1
In [12]: p = pd.pivot_table(table, index=["weekday"], columns=["weekofyear"], values="people", aggfunc=np.sum)
In [13]: p
Out[13]:
weekofyear 1 52
weekday
0 76 NaN
1 95 NaN
2 102 NaN
3 87 NaN
6 NaN 119
In [14]: p.columns
Out[14]: Int64Index([1, 52], dtype='int64', name='weekofyear')
aggfunc : function, default numpy.mean, or list of functions
If list of functions passed, the resulting pivot table will have hierarchical columns whose top level are the function names (inferred from the function objects themselves)
与 values
的情况类似,尽管文档中没有具体提及
关于python - Pivot_table 到列的多重索引,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/32654495/