我正在尝试使用带有条件的 for 循环来计算事件的持续时间。
mydf= {'Duration' : [14, 8, 6, 36, 12, 5, 3, 2, 4, 5, 8, 3, 14, 1, 27, 25, 117, 2, 962, 2, 2, 1],
'Activity': ['Groom', 'Pause', 'Groom', 'Sleep', 'Sleep', 'Sleep', 'Sleep', 'Sleep', 'Sleep', 'Sleep', 'Sleep', 'Sleep', 'Sleep', 'Awaken', 'Groom', 'Pause', 'Groom', 'Eat', 'Cuddle', 'Come down', 'Dig', 'Forage']}
df = pd.DataFrame(mydf)
我想将所有 sleep 持续时间加在一起。
我尝试过:
Sleep_sum = [sum(df['Duration'] for i in df['Activity'] if [i+1]=='Sleep')]
但这给了我一个错误:TypeError: can only concatenate str (not "int") to str.
我也尝试过这个:
for i in range of len(df):
df['Activity'][i] == 'Sleep'
if [i] = [i+1]
df['Duration'].sum()
本质上,如果后面跟着另一行==“ sleep ”,我需要对“ sleep ”的持续时间进行求和。
感谢您的宝贵时间!
最佳答案
您可以创建虚拟组来对连续事件( sleep 和其他)进行分组:
out = (df.groupby(df['Activity'].ne(df['Activity'].shift()).cumsum(), as_index=False)
.agg({'Duration': 'sum', 'Activity': 'first'}))
print(out)
# Output
Duration Activity
0 14 Groom
1 8 Pause
2 6 Groom
3 92 Sleep
4 1 Awaken
5 27 Groom
6 25 Pause
7 117 Groom
8 2 Eat
9 962 Cuddle
10 2 Come down
11 2 Dig
12 1 Forage
更新
仅用于 sleep
事件:
m = df['Activity'] == 'Sleep'
sleep = (df.reset_index(drop=False)[m]
.groupby(df['Activity'].ne(df['Activity'].shift()).cumsum())
.agg({'index': 'first', 'Duration': 'sum', 'Activity': 'first'})
.set_index('index'))
out = pd.concat([df[~m], sleep]).sort_index()
print(out)
# Output
Duration Activity
0 14 Groom
1 8 Pause
2 6 Groom
3 92 Sleep
13 1 Awaken
14 27 Groom
15 25 Pause
16 117 Groom
17 2 Eat
18 962 Cuddle
19 2 Come down
20 2 Dig
21 1 Forage
关于python - 如何根据重复值汇总列?,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/73058761/