machine-learning - graphlab不推荐用户已经拥有的项目

标签 machine-learning recommendation-engine graphlab

在创建 graphlab 推荐模型时应该指定什么,以便用户已经拥有的项目不会再次推荐给他?这可以通过指定某些参数直接完成还是我需要从头开始编写推荐器?数据看起来像这样

| user_id    |  item_id    | othercolumns |
|:-----------|------------:|:------------:|
| 1          |     21      |     This     | 
| 2          |     22      |     column   |
| 1          |     23      |     will     |
| 3          |     24      |     hold     |
| 2          |     25      |     other    |
| 1          |     26      |     values   |

由于项目 21,23 和 26 已由用户 1 拥有,因此不应向他推荐该项目。

最佳答案

此行为由 recommender.recommend 方法 ( doc ) 的 exclude_known 参数控制。

exclude_known : bool, optional

By default, all user-item interactions previously seen in the training data, or in any new data provided using new_observation_data.., are excluded from the recommendations. Passing in exclude_known = False overrides this behavior.

示例

>>> import graphlab as gl
>>> sf = gl.SFrame({'user_id':[1,2,1,3,2,1], 'item_id':[21,22,23,24,25,26]})
>>> print sf
+---------+---------+
| item_id | user_id |
+---------+---------+
|    21   |    1    |
|    22   |    2    |
|    23   |    1    |
|    24   |    3    |
|    25   |    2    |
|    26   |    1    |
+---------+---------+
[6 rows x 2 columns]
>>> rec_model = gl.recommender.create(sf)
>>> # we recommend items not owned by user
>>> rec_wo_own_item = rec_model.recommend(sf['user_id'].unique())
>>> rec_wo_own_item.sort('user_id').print_rows(100)
+---------+---------+----------------+------+
| user_id | item_id |     score      | rank |
+---------+---------+----------------+------+
|    1    |    22   |      0.0       |  1   |
|    1    |    24   |      0.0       |  2   |
|    1    |    25   |      0.0       |  3   |
|    2    |    21   |      0.0       |  1   |
|    2    |    23   |      0.0       |  2   |
|    2    |    24   |      0.0       |  3   |
|    2    |    26   |      0.0       |  4   |
|    3    |    21   | 0.333333333333 |  1   |
|    3    |    23   | 0.333333333333 |  2   |
|    3    |    26   | 0.333333333333 |  3   |
|    3    |    22   | 0.166666666667 |  4   |
|    3    |    25   | 0.166666666667 |  5   |
+---------+---------+----------------+------+
[12 rows x 4 columns]
>>> # we recommend items owned by user
>>> rec_w_own_item = rec_model.recommend(sf['user_id'].unique(), exclude_known=False)
>>> rec_w_own_item.sort('user_id').print_rows(100)
+---------+---------+----------------+------+
| user_id | item_id |     score      | rank |
+---------+---------+----------------+------+
|    1    |    21   | 0.666666666667 |  1   |
|    1    |    23   | 0.666666666667 |  2   |
|    1    |    26   | 0.666666666667 |  3   |
|    1    |    22   |      0.0       |  4   |
|    1    |    24   |      0.0       |  5   |
|    1    |    25   |      0.0       |  6   |
|    2    |    26   |      0.0       |  6   |
|    2    |    24   |      0.0       |  5   |
|    2    |    23   |      0.0       |  4   |
|    2    |    21   |      0.0       |  3   |
|    2    |    25   |      0.5       |  2   |
|    2    |    22   |      0.5       |  1   |
|    3    |    24   |      0.0       |  6   |
|    3    |    25   | 0.166666666667 |  5   |
|    3    |    22   | 0.166666666667 |  4   |
|    3    |    26   | 0.333333333333 |  3   |
|    3    |    23   | 0.333333333333 |  2   |
|    3    |    21   | 0.333333333333 |  1   |
+---------+---------+----------------+------+
[18 rows x 4 columns]
>>> # we add recommended items not owned by user to the original SFrame
>>> rec = rec_wo_own_item.groupby('user_id', {'reco':gl.aggregate.CONCAT('item_id')})
>>> sf = sf.join(rec, 'user_id', 'left')
>>> print sf
+---------+---------+----------------------+
| item_id | user_id |         reco         |
+---------+---------+----------------------+
|    21   |    1    |     [24, 25, 22]     |
|    22   |    2    |   [24, 26, 23, 21]   |
|    23   |    1    |     [24, 25, 22]     |
|    24   |    3    | [21, 23, 26, 25, 22] |
|    25   |    2    |   [24, 26, 23, 21]   |
|    26   |    1    |     [24, 25, 22]     |
+---------+---------+----------------------+
[6 rows x 3 columns]

关于machine-learning - graphlab不推荐用户已经拥有的项目,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/39314181/

相关文章:

python - 将graphlab sframe转换为{key : values}的字典

python - 添加新列后 SFrame 操作速度变慢

text - 用于拼写检查的机器学习算法

python - 如何使用 scala 或 python 在 apache spark 中运行多线程作业?

algorithm - 基于点赞的推荐

google-analytics - 从 API 获取 Google Analytics "Visitors Flow"数据

python - 如何将 gplearn 的输出导出为 sympy 表达式或其他可读格式?

python - 如何使用 Python 中的面向对象编程来构建机器学习项目?

machine-learning - 为什么信息增益特征选择给出零分

python - 图里创建错误: 'module' object not callable