给定:
applications = pd.DataFrame({'application_id': [1, 2, 3, 4, 5],
'date': ['2015-01-05', '2015-01-06', '2015-01-07', '2015- 01-08', '2015-01-09'],
'client_employer': ['company A', 'company B', 'company C', 'company A', 'company B'],
'client_name': ['Bill', 'John', 'Steve', 'Bill', 'Alex']})
表:
date client_employer client_name
0 2015-01-05 company A John
1 2015-01-06 company B Bill
2 2015-01-07 company B Bill
3 2015-01-08 company A Sarah
4 2015-01-09 company B Alex
5 2015-01-10 company B Brian
我们过去有多少不同的人在同一个雇主工作?无循环
所需输出:
date client_employer client_name employers_count
0 2015-01-05 company A John 0
1 2015-01-06 company B Bill 0
2 2015-01-07 company B Bill 0
3 2015-01-08 company A Sarah 1
4 2015-01-09 company B Alex 1
5 2015-01-10 company B Brian 2
建议无法正常工作:
applications = pd.DataFrame({'application_id': [1, 2, 3, 4, 5, 6],
'date': ['2015-01-05', '2015-01-06', '2015-01-07', '2015-01-08', '2015-01-09', '2015-01-10'],
'client_employer': ['company B', 'company B', 'company B', 'company B', 'company B', 'company B'],
'client_name': ['Bill', 'John', 'Steve', 'Bill', 'Alex', 'Bill'],
'cnt_desired': [0, 1, 2, 2, 3, 3]})
emp_count = applications.groupby(['client_employer'])['client_name'].transform(lambda x: x.map(dict(zip(x.unique(),np.arange(len(x.unique()))))))
applications['cnt'] = emp_count
application_id date client_employer client_name cnt_desired cnt
0 1 2015-01-05 company B Bill 0 0
1 2 2015-01-06 company B John 1 1
2 3 2015-01-07 company B Steve 2 2
3 4 2015-01-08 company B Bill 2 0
4 5 2015-01-09 company B Alex 3 3
5 6 2015-01-10 company B Bill 3 0
最佳答案
首次使用groupby
上client_employer
然后访问client_name
列并使用 map
转换列基于 dict
创建的client_name
唯一值作为键和 range
作为值的唯一值的数量:
df['employers_count'] = df.groupby(['client_employer'])['client_name'].transform(lambda x: x.map(dict(zip(x.unique(),range(x.nunique())))))
date client_employer client_name employers_count
0 2015-01-05 company A John 0
1 2015-01-06 company B Bill 0
2 2015-01-07 company B Bill 0
3 2015-01-08 company A Sarah 1
4 2015-01-09 company B Alex 1
5 2015-01-10 company B Brian 2
关于python pandas时间序列计数之前匹配的数量,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/52513402/