我正在尝试为客户和一周创建标签。
我需要 labeling_function
来计算从星期一(含)到星期一(不含)的销售额。
但现在是从周日到周日计算。
如何更改 LabelMaker
一周的开始日期
def total_spent(df):
total = df['amount'].sum()
return total
label_maker = cp.LabelMaker(
target_entity="customer_id",
time_index="transaction_time",
labeling_function=total_spent,
window_size="W",
)
最佳答案
谢谢你的提问。您可以通过将窗口大小设置为 W-MON
来获取每周一的频率。我将通过一个简单的示例来了解这些数据。
import pandas as pd
records = []
for time in pd.date_range(start='2020-11-16', periods=15, freq='d'):
record = {'transaction_time': time, 'day_name': time.day_name()}
records.append(record)
df = pd.DataFrame(records).assign(customer_id=0)
transaction_time day_name customer_id
2020-11-16 Monday 0
2020-11-17 Tuesday 0
2020-11-18 Wednesday 0
2020-11-19 Thursday 0
2020-11-20 Friday 0
2020-11-21 Saturday 0
2020-11-22 Sunday 0
2020-11-23 Monday 0
2020-11-24 Tuesday 0
2020-11-25 Wednesday 0
2020-11-26 Thursday 0
2020-11-27 Friday 0
2020-11-28 Saturday 0
2020-11-29 Sunday 0
2020-11-30 Monday 0
在标签制作器中,我将窗口大小设置为W-MON
。这是每周一频率的偏移别名。窗口大小还支持很多其他offset aliases来自 Pandas 。
import composeml as cp
lm = cp.LabelMaker(
target_entity='customer_id',
time_index='transaction_time',
window_size='W-MON',
)
让我们检查一下标签生成器生成的数据切片。您应该在周一获得每周频率。
slices = lm.slice(df, -1)
next(slices)
day_name customer_id
transaction_time
2020-11-16 Monday 0
2020-11-17 Tuesday 0
2020-11-18 Wednesday 0
2020-11-19 Thursday 0
2020-11-20 Friday 0
2020-11-21 Saturday 0
2020-11-22 Sunday 0
next(slices)
day_name customer_id
transaction_time
2020-11-23 Monday 0
2020-11-24 Tuesday 0
2020-11-25 Wednesday 0
2020-11-26 Thursday 0
2020-11-27 Friday 0
2020-11-28 Saturday 0
2020-11-29 Sunday 0
关于python - 更改 LabelMaker 一周的开始日期(composeml 库),我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/64874200/