python - 按数据框中的对象属性分组

我有一个简单的类(class):

class Person:

    def __init__(self, name: str, age: int):
        self.name = name
        self.age = age

    def __repr__(self):
        return f'{self.name} [{self.age} yrs old]'

我用它来存储 pandas 数据框中的 Person 实例列表，如下所示:

import pandas as pd

people = []
people.append(Person(name='alex', age=20))
people.append(Person(name='daisy', age=20))
people.append(Person(name='john', age=30))
people.append(Person(name='alex', age=40))
cities = ['London', 'Edinburgh', 'London', 'Edinburgh']
data = dict(people=people, cities=cities)
df = pd.DataFrame(data, columns=data.keys())

有没有办法按people.age对data_frame进行分组，例如？

编辑

做这样的事情:

df.groupby('people.age').groups

会输出类似这样的内容:

{'20': Int64Index([0, 1], dtype='int64'), '30': Int64Index([2], dtype='int64'), '40': Int64Index([3], dtype='int64')}

最佳答案

pandas 实际上并不是用来存储对象的。您有一个带有属性的类，使用 pandas 可以更好地将其组织为 2D 结构(每人 1 行，每个属性都有列)。

持有类的问题是我们需要求助于非常慢的Series.apply才能做任何事情。语法还可以简洁，但是性能却惨不忍睹。

在您的情况下，我们使用getattr。

df.groupby(df.people.apply(lambda x: getattr(x, 'age'))).groups
#{20: Int64Index([0, 1], dtype='int64'),
# 30: Int64Index([2], dtype='int64'),
# 40: Int64Index([3], dtype='int64')}

<小时/>

我推荐以下内容:

df = pd.DataFrame([x.__dict__ for x in people]).assign(cities=cities)
#   age   name     cities
#0   20   alex     London
#1   20  daisy  Edinburgh
#2   30   john     London
#3   40   alex  Edinburgh

df.groupby('age').groups
#{20: Int64Index([0, 1], dtype='int64'),
# 30: Int64Index([2], dtype='int64'),
# 40: Int64Index([3], dtype='int64')}

关于python - 按数据框中的对象属性分组，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/57204617/

上一篇：python - 我想从数据框中的行创建值列表，并删除前 2 个元素

下一篇：python - 如何使树莓派之间的通信代码成为多线程？

r - 从 data.frame 中删除 NA 超过列长度 15% 的列

python - 使用 psutil 设置进程优先级

python - Redisearch 前缀搜索始终返回多个字段索引的总计最大值 200

python - 使用包含列表的列从现有 DataFrame 构建一个新的 DataFrame(使用列表填充新行)

python - 替换数据框行中的标记

python 2.7 : ImportError: DLL load failed: The specified module could not be found

python - 如何在 Heroku(使用 python)上的 web 和 worker(不同的 dynos)之间通信数据？

python - 将数据帧拆分为多个数据帧

python - 对同一数据帧的不同部分进行操作