python pandas时间序列计数之前匹配的数量

标签 python pandas aggregate-functions

给定:

applications = pd.DataFrame({'application_id': [1, 2, 3, 4, 5],
                   'date': ['2015-01-05', '2015-01-06', '2015-01-07', '2015- 01-08', '2015-01-09'],
                    'client_employer': ['company A', 'company B', 'company C', 'company A', 'company B'],
                    'client_name': ['Bill', 'John', 'Steve', 'Bill', 'Alex']})

表:

         date  client_employer client_name 
   0  2015-01-05 company A       John 
   1  2015-01-06 company B       Bill
   2  2015-01-07 company B       Bill
   3  2015-01-08 company A       Sarah
   4  2015-01-09 company B       Alex
   5  2015-01-10 company B       Brian

我们过去有多少不同的人在同一个雇主工作?无循环

所需输出:

       date  client_employer client_name  employers_count
 0  2015-01-05 company A       John         0
 1  2015-01-06 company B       Bill         0
 2  2015-01-07 company B       Bill         0
 3  2015-01-08 company A       Sarah        1
 4  2015-01-09 company B       Alex         1
 5  2015-01-10 company B       Brian        2

建议无法正常工作:

applications = pd.DataFrame({'application_id': [1, 2, 3, 4, 5, 6],
                       'date': ['2015-01-05', '2015-01-06', '2015-01-07', '2015-01-08', '2015-01-09', '2015-01-10'],
                        'client_employer': ['company B', 'company B', 'company B', 'company B', 'company B', 'company B'],
                        'client_name': ['Bill', 'John', 'Steve', 'Bill', 'Alex', 'Bill'],
                        'cnt_desired': [0, 1, 2, 2, 3, 3]})

emp_count = applications.groupby(['client_employer'])['client_name'].transform(lambda x: x.map(dict(zip(x.unique(),np.arange(len(x.unique()))))))
applications['cnt'] = emp_count

   application_id         date client_employer client_name  cnt_desired  cnt
0               1   2015-01-05       company B        Bill            0    0
1               2   2015-01-06       company B        John            1    1
2               3   2015-01-07       company B       Steve            2    2
3               4   2015-01-08       company B        Bill            2    0
4               5   2015-01-09       company B        Alex            3    3
5               6   2015-01-10       company B        Bill            3    0

最佳答案

首次使用groupbyclient_employer然后访问client_name列并使用 map 转换列基于 dict 创建的client_name唯一值作为键和 range作为值的唯一值的数量:

df['employers_count'] = df.groupby(['client_employer'])['client_name'].transform(lambda x: x.map(dict(zip(x.unique(),range(x.nunique())))))

         date client_employer client_name  employers_count
0  2015-01-05       company A       John                 0
1  2015-01-06       company B        Bill                0
2  2015-01-07       company B        Bill                0
3  2015-01-08       company A       Sarah                1
4  2015-01-09       company B        Alex                1
5  2015-01-10       company B       Brian                2

关于python pandas时间序列计数之前匹配的数量,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/52513402/

相关文章:

Python [pydantic] - 日期验证

python - 使用字符串的值调用值的列表

postgresql - 如何使用 PostgreSQL 中的表提供的可变日期范围对行进行计数

sql - 如何通过每个产品单行的产品获得不同的价格

php - mysql - 来自桥接表的多对多查询

python - 列表中的两个随机值需要与图像匹配

python - 将字符串的一部分从一个字符删除到另一个字符

python - pandas 在 for 循环中删除列的有效方法

python - 在多索引 pandas DataFrame 中打开 'pretty viewing'

python - 仅使用不同颜色为 matplotlib barplot 中的某些条着色