python - 如何基于滚动窗口查找将列附加到 Panda 数据框?

标签 python pandas dataframe

我有一个 500 万行的客户、周期和订单列表,需要附加 3、6 和 12 个月的滚动窗口查找。以下是数据示例:

我有一个数据框 dftest1:

{'period': {0: 201810, 1: 201811, 2: 201812, 3: 201901, 4: 201902, 5: 201903, 6: 201904, 7: 201905, 8: 201906, 9: 201907, 10: 201908, 11: 201909, 12: 201910, 13: 201911, 14: 201912, 15: 202001, 16: 202002, 17: 202003, 18: 202004, 19: 202005, 20: 202006}, 'customer': {0: 'ABC', 1: 'ABC', 2: 'ABC', 3: 'ABC', 4: 'ABC', 5: 'ABC', 6: 'ABC', 7: 'ABC', 8: 'ABC', 9: 'ABC', 10: 'ABC', 11: 'ABC', 12: 'ABC', 13: 'ABC', 14: 'ABC', 15: 'ABC', 16: 'ABC', 17: 'ABC', 18: 'ABC', 19: 'ABC', 20: 'ABC'}, 'orders': {0: 0.0, 1: 0.0, 2: 0.0, 3: 0.0, 4: 0.0, 5: 0.0, 6: 0.0, 7: 0.0, 8: 0.0, 9: 0.0, 10: 0.0, 11: 1.0, 12: 0.0, 13: 0.0, 14: 0.0, 15: 1.0, 16: 0.0, 17: 2.0, 18: 1.0, 19: 1.0, 20: 1.0}}

现在,以下程序运行并交付了我想要的内容,但在我的完整数据集上运行需要很长时间:

dftest1['countOfOrdersLast3months']=''
dftest1['countOfOrdersLast6months']=''
dftest1['countOfOrdersLast12months']=''
for x, y in dftest1.iterrows():
    customer = dftest1['customer'].values[x]
    currentDate = dftest1['period'].values[x]
    dftest1['countOfOrdersLast3months'].values[x] = dftest1[(dftest1['customer']==customer) & ((dftest1['period'] > currentDate-3) & (dftest1['period'] <= currentDate))]['orders'].sum()
    dftest1['countOfOrdersLast6months'].values[x]  = dftest1[(dftest1['customer']==customer) & ((dftest1['period'] > currentDate-6) & (dftest1['period'] <= currentDate))]['orders'].sum()
    dftest1['countOfOrdersLast12months'].values[x] = dftest1[(dftest1['customer']==customer) & ((dftest1['period'] > currentDate-12) & (dftest1['period'] <= currentDate))]['orders'].sum()
dftest1.head()

有没有更好的方法可以更快地做到这一点?

最佳答案

你可以试试这样的东西

df['cumorders'] = df.groupby('customer')['orders'].cumsum()
df['countOfOrdersLast3months'] = df.groupby('customer')['cumorders'].diff(3)

和类似的 6m 等

这里在第一行我们计算累计订单(每个客户),在第二行我们对 cumorders 进行 3 步差,即 3 个月内的订单

这是我从你的例子中得到的

period  customer    orders  cumorders   countOfOrdersLast3months
0   201810  ABC 0.0 0.0 NaN
1   201811  ABC 0.0 0.0 NaN
2   201812  ABC 0.0 0.0 NaN
3   201901  ABC 0.0 0.0 0.0
4   201902  ABC 0.0 0.0 0.0
5   201903  ABC 0.0 0.0 0.0
6   201904  ABC 0.0 0.0 0.0
7   201905  ABC 0.0 0.0 0.0
8   201906  ABC 0.0 0.0 0.0
9   201907  ABC 0.0 0.0 0.0
10  201908  ABC 0.0 0.0 0.0
11  201909  ABC 1.0 1.0 1.0
12  201910  ABC 0.0 1.0 1.0
13  201911  ABC 0.0 1.0 1.0
14  201912  ABC 0.0 1.0 0.0
15  202001  ABC 1.0 2.0 1.0
16  202002  ABC 0.0 2.0 1.0
17  202003  ABC 2.0 4.0 3.0
18  202004  ABC 1.0 5.0 3.0
19  202005  ABC 1.0 6.0 4.0
20  202006  ABC 1.0 7.0 3.0

关于python - 如何基于滚动窗口查找将列附加到 Panda 数据框?,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/65012962/

相关文章:

python - 在 Pandas 中有效地使用替换

python - 如何以正确的顺序打印Scrapy项目键?

python - 绑定(bind)多个 PyGame 窗口的可能性

python - 如何将 Excel 作为电子邮件中的附件发送 使用 mandrill 和 Python 3.6.8

python - 在 pandas df 中返回列名称的最有效方法

python-3.x - Python 中的大型 XML 文件解析

python - wxPython 和 CEF Python 3

Python:如何在不同的pandas数据框列之间求平均值?

python - 如何找到连续不同值的平均值/中位数?

python Pandas : applying different aggregate functions to different columns