我试图找到连续的零值,并被这个问题困扰了几个小时。
我有一个像这样的数据框:
Day | ID | Values
-------------------
1 | aa | 0
1 | aa | 0
1 | aa | 0
1 | aa | 0
1 | aa | 2.5
1 | aa | 2.3
1 | aa | 0
1 | aa | 0
1 | aa | 0
2 | aa | 0
2 | aa | 0
2 | aa | 2.3
2 | aa | 0
1 | bb | 0
1 | bb | 0
1 | bb | 0
1 | bb | 0
1 | bb | 3.5
我想找到连续的零值,如下所示:
Day | ID | Values | consec_zeros
--------------------------------------
1 | aa | 0 | 0
1 | aa | 0 | 1
1 | aa | 0 | 2
1 | aa | 0 | 3
1 | aa | 2.5 | 4 # --> there were 4 of consecutive 0s
1 | aa | 2.3 | 0 # 2.5 just destroy consecutive values
1 | aa | 0 | 0
1 | aa | 0 | 1
1 | aa | 0 | 2
2 | aa | 0 | 0 # no 0s before this of Day 2
2 | aa | 0 | 1
2 | aa | 2.3 | 2
2 | aa | 0 | 0
1 | bb | 0 | 0 # --> no 0s before this in ID 'bb'
1 | bb | 0 | 1
1 | bb | 0 | 2
1 | bb | 0 | 3
1 | bb | 3.5 | 4
我尝试做的是:
g = df['Values'].ne(df['Values'].shift(1)).cumsum()
counts = df.groupby(['ID','Day',g])['Values'].transform('size')
df['consec_zeros'] = np.where(df['Values'].eq(0), counts, 0)
由于我是新手,请帮助并指出我做错了什么。
提前谢谢
最佳答案
这是主要问题,通过 GroupBy.cumcount
通过第一个非零值添加下一个计数器值,但也将其用于脱粒,在我的解决方案中添加了 1
来计数器以区分计数器中的第一个值:
g = df['Values'].ne(df['Values'].shift(1)).cumsum()
counts = df.groupby(['ID','Day',g])['Values'].cumcount() + 1
df['consec_zeros'] = np.where(df['Values'].eq(0), counts, 0)
#replace 0 to `NaN`s
a = df['consec_zeros'].mask(df['consec_zeros'].eq(0))
#add 1 to forward filling missing values by limit 1 per groups
df['consec_zeros'] = (np.where(a.isna(),
a.groupby([df['ID'],df['Day']]).ffill(limit=1) + 1,
df['consec_zeros']) - 1)
df['consec_zeros'] = df['consec_zeros'].fillna(0).astype(int)
print (df)
Day ID Values consec_zeros
0 1 aa 0.0 0
1 1 aa 0.0 1
2 1 aa 0.0 2
3 1 aa 0.0 3
4 1 aa 2.5 4
5 1 aa 2.3 0
6 1 aa 0.0 0
7 1 aa 0.0 1
8 1 aa 0.0 2
9 2 aa 0.0 0
10 2 aa 0.0 1
11 2 aa 2.3 2
12 2 aa 0.0 0
13 1 bb 0.0 0
14 1 bb 0.0 1
15 1 bb 0.0 2
16 1 bb 0.0 3
17 1 bb 3.5 4
关于python - 在某些条件下查找 pandas 的连续值,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/56579690/