我找到了一个按行解决这个问题的解决方案,但是有没有一种快速的方法可以按列来解决这个问题?
这是数据框的快速示例:
import pandas as pd
import numpy as np
df = pd.DataFrame([['GB',43.76],
['TEN',17.3],
['ARI',0.2],
['ATL',12.3],
['HOU',21.1],
['ARI',1.7],
['ATL',12.6],
['SF',15.0],
['GB',5.7],
[1.0,np.nan],
['GB',43.76],
['TEN',17.3],
['ARI',0.2],
['ATL',12.3],
['HOU',21.1],
['ARI',1.7],
['ATL',12.6],
['BUF',7.0],
['GB',5.7],
[2.0,np.nan]], columns = ['team','points'])
我一直在尝试操作df['sum'] = df['points'].cumsum()
。显然它会计算累积总和,但我需要它做的是当/如果达到 nan
时重新启动,而不是仅仅跳过它。
最佳答案
使用GroupBy.cumsum
通过另一个 cumsum
检查缺失值创建的助手系列:
df['sum'] = df.groupby(df['points'].isna().cumsum())['points'].cumsum()
print (df)
team points sum
0 GB 43.76 43.76
1 TEN 17.30 61.06
2 ARI 0.20 61.26
3 ATL 12.30 73.56
4 HOU 21.10 94.66
5 ARI 1.70 96.36
6 ATL 12.60 108.96
7 SF 15.00 123.96
8 GB 5.70 129.66
9 1 NaN NaN
10 GB 43.76 43.76
11 TEN 17.30 61.06
12 ARI 0.20 61.26
13 ATL 12.30 73.56
14 HOU 21.10 94.66
15 ARI 1.70 96.36
16 ATL 12.60 108.96
17 BUF 7.00 115.96
18 GB 5.70 121.66
19 2 NaN NaN
关于python - pandas fillna 在列中包含前几行的累加值(每个 nan 后重置),我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/59718443/