python - 如何使用 MultiplyNumeric 将日期转换为重量和同一个表的值?

标签 python featuretools

我的主要目标是提供一个考虑更高值(value)的最新信息的功能。

因此,我们的想法是通过新的基元转换“WeightTimeUntil”计算权重因子,然后转换基元“MultiplyNumeric”可以使用该因子来获取加权值。

我使用了演练 walkthrough Will Koehrsen 作为数据和实体设置的起点。

因此我遇到了以下问题:

  1. featuretools 尚未选择我想要实现的组合(见下文)
  2. 看起来 featuretools 没有选择组合是因为类型不匹配?!
  3. 通过更改我想要乘以权重因子的值的类型,我设法获得了正确的组合,但没有达到正确的目标
  4. 对于目标相同的客户,featuretools 根本没有选择我想要的组合。仅当我使用目标等额贷款(其中日期和值(value)为列)时,featuretools 才使用正确的组合

这里是“WeightTimeUntil”原语的代码

def weight_time_until(array, time):
    diff = pd.DatetimeIndex(array) - time
    s = np.floor(diff.days/365/0.5)
    aWidth = 9
    a = math.log(0.1) / ( -(aWidth -1) )

    w = np.exp(-a*s) 

    return w


    WeightTimeUntil = make_trans_primitive(function=weight_time_until,
                                 input_types=[Datetime],
                                 return_type=Numeric,
                                 uses_calc_time=True,
                                 description="Calculates weight time until the cutoff time",
                                 name="weight_time_until")

这里是DFS执行代码:

features, feature_names = ft.dfs(entityset = es, target_entity = 'clients', 
                                 agg_primitives = ['sum'],
                                 trans_primitives = [WeightTimeUntil, MultiplyNumeric]) 

这里是功能列表:

 <Feature: income>,
 <Feature: credit_score>,
 <Feature: join_month>,
 <Feature: log_income>,
 <Feature: SUM(loans.loan_amount)>,
 <Feature: SUM(loans.rate)>,
 <Feature: SUM(payments.payment_amount)>,
 <Feature: WEIGHT_TIME_UNTIL(joined)>,
 <Feature: join_month * log_income>,
 <Feature: income * log_income>,
 <Feature: income * join_month>,
 <Feature: credit_score * join_month>,
 <Feature: credit_score * log_income>,
 <Feature: credit_score * income>,
 <Feature: SUM(loans.WEIGHT_TIME_UNTIL(loan_start))>,
 <Feature: SUM(loans.WEIGHT_TIME_UNTIL(loan_end))>,
 <Feature: SUM(loans.loan_amount * rate)>,
 <Feature: income * SUM(loans.loan_amount)>,
 <Feature: credit_score * SUM(loans.loan_amount)>,
 <Feature: log_income * SUM(payments.payment_amount)>,
 <Feature: log_income * WEIGHT_TIME_UNTIL(joined)>,
 <Feature: income * SUM(payments.payment_amount)>,
 <Feature: join_month * SUM(loans.rate)>,
 <Feature: income * SUM(loans.rate)>,
 <Feature: join_month * SUM(loans.loan_amount)>,
 <Feature: SUM(loans.rate) * SUM(payments.payment_amount)>,
 <Feature: credit_score * WEIGHT_TIME_UNTIL(joined)>,
 <Feature: SUM(loans.rate) * WEIGHT_TIME_UNTIL(joined)>,
 <Feature: income * WEIGHT_TIME_UNTIL(joined)>,
 <Feature: log_income * SUM(loans.loan_amount)>,
 <Feature: SUM(loans.loan_amount) * WEIGHT_TIME_UNTIL(joined)>,
 <Feature: SUM(loans.loan_amount) * SUM(payments.payment_amount)>,
 <Feature: credit_score * SUM(loans.rate)>,
 <Feature: log_income * SUM(loans.rate)>,
 <Feature: credit_score * SUM(payments.payment_amount)>,
 <Feature: SUM(payments.payment_amount) * WEIGHT_TIME_UNTIL(joined)>,
 <Feature: join_month * WEIGHT_TIME_UNTIL(joined)>,
 <Feature: SUM(loans.loan_amount) * SUM(loans.rate)>,
 <Feature: join_month * SUM(payments.payment_amount)>

我期待这样的事情:

SUM(loans.loan_amount * loans.WEIGHT_TIME_UNTIL(loan_start))>

最佳答案

这里的问题是 SUM(loans.loan_amount * Loans.WEIGHT_TIME_UNTIL(loan_start))> 是一个深度 3 的特征,因为您正在堆叠 Sum, MultiplyNumericWeightTimeUntil。您可以在文档 here 中阅读有关深度的更多信息。 .

您可以通过增加对 dfs 的调用中允许的深度来解决此问题,如下所示

features, feature_names = ft.dfs(entityset = es, target_entity = 'clients', 
                                 agg_primitives = ['sum'],
                                 max_depth=3,
                                 trans_primitives = [WeightTimeUntil, MultiplyNumeric]) 

另一种方法是将您的功能作为种子功能提供,该功能不计入最大深度。你可以这样做

seed_features=[ft.Feature(es["loans"]["loan_start"], primitive=WeightTimeUntil)]

features, feature_names = ft.dfs(entityset = es, target_entity = 'clients', 
                                 agg_primitives = ['sum'],
                                 seed_features=seed_features,
                                 trans_primitives = [MultiplyNumeric])

第二种方法是更好的选择,因为它会创建您想要的功能,但总体功能较少。

关于python - 如何使用 MultiplyNumeric 将日期转换为重量和同一个表的值?,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/56807278/

相关文章:

python - Beautifulsoup 通过 <br/> 拆分标签中的文本

python - 寻找机器人导航的算法

python - FeatureTools:处理多对多关系

未设置功能工具 last_time_index

python - featuretools历史标签计数

python - [featuretools ]'EntitySet' 对象没有属性 'entity_from_dataframe'

python - 在 Featuretools 中创建实体集错误 TypeError : 'str' object does not support item assignment

python - Plotly Graph Python上的Y轴值乱序

python - 比较二进制数据的最快方法?

python - 读取任何 PC 兼容的赛车方向盘输入