假设我在 Pandas 中有一个多级索引数据框,如下所示:
A B C
X Y Z
bar one a -0.007381 -0.365315 -0.024817
b -1.219794 0.370955 -0.795125
baz three a 0.145578 1.428502 -0.408384
b -0.249321 -0.292967 -1.849202
two a -0.249321 -0.292967 -1.849202
four a 0.21 -0.967123 1.202234
foo one b -1.046479 -1.250595 0.781722
a 1.314373 0.333150 0.133331
qux one c 0.716789 0.616471 -0.298493
two b 0.385795 -0.915417 -1.367644
我想知道:
叶子大小每个级别的每个值。在上面的示例中,这将是:
bar: 2 bar & one: 2 bar & one & a: 1 bar & one & b: 1 baz: 4 baz & three: 2 baz & three & a: 1 baz & three & b: 1 etc.
连续级别之间的相对大小。在上面的示例中,这将是:
# First level -> Second level : bar: 1 (i.e. grouping ["one"]) baz: 3 (i.e. grouping ["three", two", "four"]) foo: 1 (i.e. grouping ["one"]) qux: 2 (i.e. grouping ["one", "two"]) # Second level -> Third level ... # Third level -> Fourth level (if we had one) etc.
有没有什么方法可以在 Pandas 中执行此操作,并且(最好)也可以在数据框中获得结果?
最佳答案
好吧,既然你添加了另一部分,我会充实我的答案。要执行第 1 部分,我将使用列表理解来循环不同的分组级别并获取所有组的大小。然后 concat
将来自每个 groupby 的结果数据帧组合在一起:
print pd.concat([df.groupby(level=x).size() for x in [0,[0,1],[0,1,2]]])
bar 2
baz 4
foo 2
qux 2
(bar, one) 2
(baz, four) 1
(baz, three) 2
(baz, two) 1
(foo, one) 2
(qux, one) 1
(qux, two) 1
(bar, one, a) 1
(bar, one, b) 1
(baz, four, a) 1
(baz, three, a) 1
(baz, three, b) 1
(baz, two, a) 1
(foo, one, a) 1
(foo, one, b) 1
(qux, one, c) 1
(qux, two, b) 1
第 2 部分更复杂,但我认为我们可以使用相同的结构。可能有很多方法可以做到这一点,但我将在相同的基本列表理解中使用 ngroups 方法:
def group_count(df,x):
by = df['A'].groupby(level=x[0])
return by.agg(lambda g: g.groupby(level=x[1]).ngroups)
lvl = [0,[0,1],[0,1,2]]
print pd.concat([group_count(df,x) for x in zip(lvl[:-1],lvl[1:])])
bar 1
baz 3
foo 1
qux 2
(bar, one) 2
(baz, four) 1
(baz, three) 2
(baz, two) 1
(foo, one) 2
(qux, one) 1
(qux, two) 1
当然你可能不喜欢索引作为元组;如果您愿意,您可以在列表理解中重置索引以获得以下内容(例如,对于第 1 部分,此 if):
lvl = [0,[0,1],[0,1,2]]
print pd.concat([df.groupby(level=x).size().reset_index() for x in lvl])
0 X Y Z
0 2 bar NaN NaN
1 4 baz NaN NaN
2 2 foo NaN NaN
3 2 qux NaN NaN
0 2 bar one NaN
1 1 baz four NaN
2 2 baz three NaN
3 1 baz two NaN
4 2 foo one NaN
5 1 qux one NaN
6 1 qux two NaN
0 1 bar one a
1 1 bar one b
2 1 baz four a
3 1 baz three a
4 1 baz three b
5 1 baz two a
6 1 foo one a
7 1 foo one b
8 1 qux one c
9 1 qux two b
关于python - Pandas 中的分层组大小,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/23401406/