制作数据框:
people = ['shayna','shayna','shayna','shayna','john']
dates = ['01-01-18','01-01-18','01-01-18','01-02-18','01-02-18']
places = ['hospital', 'hospital', 'inpatient', 'hospital', 'hospital']
d = {'Person':people,'Service_Date':dates, 'Site_Where_Served':places}
df = pd.DataFrame(d)
df
Person Service_Date Site_Where_Served
shayna 01-01-18 hospital
shayna 01-01-18 hospital
shayna 01-01-18 inpatient
shayna 01-02-18 hospital
john 01-02-18 hospital
我想做的是计算按 Site_Where_Served 分组的 Person 及其 Service_Date 的唯一对。
预期输出:
Site_Where_Served Site_Visit_Count
hospital 3
inpatient 1
我的尝试:
df[['Person', 'Service_Date']].groupby(df['Site_Where_Served']).nunique().reset_index(name='Site_Visit_Count')
但是它不知道如何重置索引。因此,我尝试将其排除在外,但我意识到它并未计算唯一的“Person”和“Service_Date”对,因为输出如下所示:
Person Service_Date
Site_Where_Served
hospital 2 2
inpatient 1 1
最佳答案
drop_duplicates
与 groupby
+ count
(df.drop_duplicates()
.groupby('Site_Where_Served')
.Site_Where_Served.count()
.reset_index(name='Site_Visit_Count')
)
Site_Where_Served Site_Visit_Count
0 hospital 3
1 inpatient 1
请注意,count
/size
之间的一个微小区别是前者不计算 NaN 条目。
元组化、groupby
和 nunique
这实际上只是修复了您当前的解决方案,但我不推荐这样做,因为它冗长而冗长,步骤比必要的多。首先,对您的列进行元组化,按 Site_Where_Served
分组,然后计数:
(df[['Person', 'Service_Date']]
.apply(tuple, 1)
.groupby(df.Site_Where_Served)
.nunique()
.reset_index(name='Site_Visit_Count')
)
Site_Where_Served Site_Visit_Count
0 hospital 3
1 inpatient 1
关于python - Pandas 等效选择计数(不同的 col1,col2)按 col3 分组,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/50360326/