python - 如何在pandas中过滤groupby的结果

我正在尝试过滤掉 groupby 的结果。

我有这张表:

A       B       C

A0      B0      0.5
A1      B0      0.2
A2      B1      0.6
A3      B1      0.4
A4      B2      1.0
A5      B2      1.2

A 是索引，它是唯一的。

其次，我有这个列表:

['A0', 'A1', 'A4']

我想按 B 进行分组，并为每个组提取具有最高 C 值的行。该行必须在每个组中的所有行之间选择，为具有上面列表中存在索引的行提供最高优先级。

此数据和代码的结果必须是:

A       B       C

A0      B0      0.5
A2      B1      0.6
A4      B2      1.0

我认为这个伪代码必须是:

group by B
for each group G:
    intersect group G rows index with indexes in the list
    if intersection is not void:
        the group G becomes the intersection
    sort the rows by C in ascending order
    take the first row as representative for this group

如何在 pandas 中做到这一点？

谢谢

最佳答案

这是一个通用的解决方案。它不漂亮，但很有效:

def filtermax(g, filter_on, filter_items, max_over):
    infilter = g.index.isin(filter_items).sum() > 0
    if infilter:
        return g[g[max_over] == g.ix[filter_items][max_over].max()]
    else:
        return g[g[max_over] == g[max_over].max()]
    return g

给出:

>>> x.groupby('B').apply(filtermax, 'A', ['A0', 'A1', 'A4'], 'C')
        B    C
B  A          
B0 A0  B0  0.5
B1 A2  B1  0.6
B2 A4  B2  1.0

如果有人能弄清楚如何阻止 B 添加为索引(至少在我的系统上 x.groupby('B', as_index=False 不会')没有帮助!)那么这个解决方案就非常完美了!

关于python - 如何在pandas中过滤groupby的结果，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/21500596/

上一篇：python - 如何查找 NumPy 数组中的部分元素

下一篇：python - urllib2:读取 https url 失败

相关文章：

python - Azure python sdk 操作系统列表

python - 如何使用 scikit learn/pandas/python 打印任意一个集群的样本/观察结果/行？

javascript - 如何过滤 JSON 对象 (JavaScript)

mysql - 分组以避免在其中一个值发生更改的行之间返回具有相同列值集的行

mysql - 对于给定的一组数据，如何检查输入是否具有整组数据？

Python Pandas - 在 Groupby 内部迭代以查找时间差异

Python 2.7 - 使用多个字典的字符串替换

javascript - 如何使用 Lodash/JS 递归过滤嵌套对象？

Javascript 过滤器选择而不是复选框

python - Scrapy 代码无法接受 Python 脚本中的参数