python - Pandas 等效选择计数(不同的 col1，col2)按 col3 分组

标签 python pandas dataframe group-by pandas-groupby

制作数据框:

people = ['shayna','shayna','shayna','shayna','john']
dates = ['01-01-18','01-01-18','01-01-18','01-02-18','01-02-18']
places = ['hospital', 'hospital', 'inpatient', 'hospital', 'hospital']
d = {'Person':people,'Service_Date':dates, 'Site_Where_Served':places}
df = pd.DataFrame(d)
df

Person   Service_Date   Site_Where_Served
shayna   01-01-18       hospital 
shayna   01-01-18       hospital 
shayna   01-01-18       inpatient 
shayna   01-02-18       hospital 
john     01-02-18       hospital

我想做的是计算按 Site_Where_Served 分组的 Person 及其 Service_Date 的唯一对。

预期输出:

Site_Where_Served    Site_Visit_Count
hospital             3
inpatient            1

我的尝试:

df[['Person', 'Service_Date']].groupby(df['Site_Where_Served']).nunique().reset_index(name='Site_Visit_Count')

但是它不知道如何重置索引。因此，我尝试将其排除在外，但我意识到它并未计算唯一的“Person”和“Service_Date”对，因为输出如下所示:

                   Person    Service_Date
Site_Where_Served
hospital              2           2 
inpatient             1           1

最佳答案

`drop_duplicates` 与 `groupby` + `count`

(df.drop_duplicates()
   .groupby('Site_Where_Served')
   .Site_Where_Served.count()
   .reset_index(name='Site_Visit_Count')
)

  Site_Where_Served  Site_Visit_Count
0          hospital                 3
1         inpatient                 1

请注意，count/size 之间的一个微小区别是前者不计算 NaN 条目。

元组化、`groupby` 和 `nunique`

这实际上只是修复了您当前的解决方案，但我不推荐这样做，因为它冗长而冗长，步骤比必要的多。首先，对您的列进行元组化，按 Site_Where_Served 分组，然后计数:

(df[['Person', 'Service_Date']]
   .apply(tuple, 1)
   .groupby(df.Site_Where_Served)
   .nunique()
   .reset_index(name='Site_Visit_Count')
)

  Site_Where_Served  Site_Visit_Count
0          hospital                 3
1         inpatient                 1

关于python - Pandas 等效选择计数(不同的 col1，col2)按 col3 分组，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/50360326/