python - 使用 Pandas DataFrame 将值放在组下方意味着

我有一个包含多个索引的 DataFrame/Series。这是生成一个的代码:

index = pd.MultiIndex.from_product([['a', 'a', 'b', 'b'], ['c', 'c', 'd', 'd']], names=['first', 'second'])
s = pd.Series(range(16), index=index)

“s”变成:

In [139]: pd.Series(range(16), index=i2)
Out[139]: 
first  second
a      c          0
       c          1
       d          2
       d          3
       c          4
       c          5
       d          6
       d          7
b      c          8
       c          9
       d         10
       d         11
       c         12
       c         13
       d         14
       d         15
dtype: int64

如何删除低于组均值的值(最初是组均值的 20%)？

In [140]: s.mean(level=[0,1])
Out[140]: 
first  second
a      c          2.5
       d          4.5
b      c         10.5
       d         12.5
dtype: float64

“愚蠢”的方法是循环遍历帧 (iterrows) 并逐一比较。必须有一种更聪明的 Pandas 方式，比如使用诸如应用/加入/等等之类的东西。我对 Pandas 很陌生。

最佳答案

IIUC，你可以为此使用transform:

>>> s.loc[s >= s.groupby(level=[0,1]).transform("mean")]
first  second
a      c          4
       c          5
       d          6
       d          7
b      c         12
       c         13
       d         14
       d         15
dtype: int64

transform 获取 groupby 缩减结果，此处为 mean，并将其扩展以匹配原始索引，这意味着我们可以使用它来创建 bool 掩码:

>>> s.groupby(level=[0,1]).transform("mean")
first  second
a      c          2.5
       c          2.5
       d          4.5
       d          4.5
       c          2.5
[and so on]
>>> s >= s.groupby(level=[0,1]).transform("mean")
first  second
a      c         False
       c         False
       d         False
       d         False
       c          True
[and so on]

我也可能会简单地编写 s.groupby(s.index).transform("mean")，但这更多的是偏好问题。

关于python - 使用 Pandas DataFrame 将值放在组下方意味着，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/29147443/

上一篇：python - 如何使用 unicode 从 python 服务解析 JSON？语法错误 : Unexpected token u

下一篇：Python 嵌套循环问题

相关文章：

python - Pandas groupby 在使用 sum() 时抛出错误

Python通过大于或等于每个子组中值的列值查找数据框中的记录

python - Pandas 数据帧 : how to turn one row into separate rows based on labelled column value

python - 返回 value_counts 的总和

python - 尝试读取 BSON 文件，得到 bson.errors.InvalidBSON : objsize too large

python - 在 Pandas 分析报告中显示 "Other Values"

python - 如何在 python 中创建从绿色到红色的热图？

python - 在 pygame 中来回移动 Sprite

python - 使用 pip3 问题安装 guppy

python - 从函数返回多个值时，有没有办法保留 pandas 数据帧的网格输出？