python - 如何使用 pandas 创建一个列来存储分组中首次出现的计数?

标签 python python-3.x pandas dataframe pandas-groupby

Q1. Given data frame 1, I am trying to get group-by unique new occurrences & another column that gives me existing ID count per month

ID     Date
1    Jan-2020
2    Feb-2020
3    Feb-2020
1    Mar-2020
2    Mar-2020
3    Mar-2020
4    Apr-2020
5    Apr-2020

Expected output for unique newly added group-by ID values & for existing sum of ID values

Date       ID_Count   Existing_count
Jan-2020      1           0
Feb-2020      2           1  
Mar-2020      0           3
Apr-2020      2           3

Note: Mar-2020 ID_Count is ZERO because ID 1, 2, and 3 were present in previous months.

Note: Existing count is 0 for Jan-2020 because there were zero IDs before Jan. The existing count for Feb-2020 is 1 because before Feb there was only 1. Mar-2020 has 3 existing counts as it adds Jan + Feb and so on

最佳答案

我认为你可以这样做:

df['month'] = pd.to_datetime(df['Date'], format='%b-%Y')

# Find new IDs
df['new'] = df.groupby('ID').cumcount()==0

# Count new IDs by month
df_ct = df.groupby('month')['new'].sum().to_frame(name='ID_Count')

# Count all previous new IDs
df_ct['Existing_cnt'] = df_ct['ID_Count'].shift().cumsum().fillna(0).astype(int) 
df_ct.index = df_ct.index.strftime('%b-%Y')
df_ct

输出:

          ID_Count  Existing_cnt
month                           
Jan-2020         1             0
Feb-2020         2             1
Mar-2020         0             3
Apr-2020         2             3

关于python - 如何使用 pandas 创建一个列来存储分组中首次出现的计数?,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/63402326/

相关文章:

python - 合并两个数据帧并删除具有重复索引的重复行(pandas)

python - 库未加载 : libmkl_intel_lp64. dylib,在 OS X 上使用 Anaconda

python - ffmpeg 结果到临时文件

python - 有没有一种方法可以通过使用opencv/dlib和直播视频来获取额头(边界框)的区域

python - 根据列值从数据框中随机选择行

python - 分布式系统: Raise error thrown on server side on client side

python - 如何将结果存储到python中的csv文件中

python - 从文本文档中查找单词并删除数据框中的相应行 - python

python - 如何查找数据透视表中列的位置

python - 将图片与视频合并