python - 如何过滤前 N 项的 groupby

在 Pandas 中，如何修改 groupby 以仅获取组中的前 N 个项目？

示例

df = pd.DataFrame({'id': [1, 1, 1, 2, 2, 2, 2], 
                   'values': [1, 2, 3, 4, 5, 6, 7]})
>>> df
   id  values
0   1       1
1   1       2
2   1       3
3   2       4
4   2       5
5   2       6
6   2       7

所需功能

# This doesn't work, but I am trying to return the first two items per group.
>>> df.groupby('id').first(2)  
   id  values
0   1       1
1   1       2
3   2       4
4   2       5

我尝试过的

我可以执行 groupby 并迭代组以获取前 n 个值的索引，但必须有一个更简单的解决方案。

n = 2  # First two rows.
idx = [i for group in df.groupby('id').groups.itervalues() for i in group[:n]]
>>> df.ix[idx]
   id  values
0   1       1
1   1       2
3   2       4
4   2       5

最佳答案

您可以使用head :

In [11]: df.groupby("id").head(2)
Out[11]:
   id  values
0   1       1
1   1       2
3   2       4
4   2       5

注意:在旧版本中，这曾经相当于 .apply(pd.DataFrame.head) 但从 0.15 开始它的效率更高(？)，现在它使用 cumcount在引擎盖下。

关于python - 如何过滤前 N 项的 groupby，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/33267670/

上一篇：python - 反斜杠后的数字值发生变化 os.path.normpath(string)

下一篇：python - 类中的嵌套字典

相关文章：

python - py脚本从mysql数据库写入xlsx文件

python - 如何通过使用多个文件在 Python 中正确使用多处理？

python - POST 请求给出空结果

python - 过滤 Pandas 数据框列时如何使用.le()和.ge()？

python - 不同数据帧列中列出的不同缺失值将替换为 NaN

python - 如果数据框中的列名相同，则连接列

python - 检查日期时间索引值

python - Pandas DataFrame 添加每个 id 的总和

python - 在 pandas load_csv 中使用 dtype 和转换器强制将 int32 作为 dtype 而不是 int64

python - 如何正确提取 Keras ConvNet 权重矩阵以在 Excel 中使用？