python - 如何将列表python计算成矩阵相似度

我的数据有问题

我使用 python 从数据库读取数据，假设分配给变量 data

type(data)是list，实际上是list of list

data = [(1, 'Shirt', 2),(1, 'Pants', 3),(2, 'Top', 2),(2, 'Shirt', 1),(2, 'T-Shirt', 4), (3, 'Shirt', 3),(3, 'T-Shirt', 2)]

data[0][0] 是 unique_id，data[0][1] 是category_product，data[0][2] 是 count

我需要基于category_product使用余弦相似度(我计划使用scipy)计算unique_id 1和2之间的相似度

unique_id 不只是两个，也可以超过 2

我认为我需要将我的数据转换为矩阵:

unique_id | Shirt | Pants | Top | T-Shirt
1 | 2 | 3 | 0 | 0 
2 | 1 | 0 | 2 | 4
3 | 3 | 0 | 0 | 2

我想使用余弦相似度计算这个矩阵，输出是:

1,2,0.121045506534
1,3,0.461538461538
2,3,0.665750285936

Sim(1,2) = 0.121045506534

我怎样才能用Python做到这一点？

谢谢

最佳答案

import pandas as pd
from scipy import spatial
from itertools import combinations

df = pd.DataFrame(data, columns=['unique_id', 'category_product', 'count'])

pt = df.pivot(index='unique_id', columns='category_product', values='count').fillna(0)

>>> pt
category_product  Pants  Shirt  T-Shirt  Top
unique_id                                   
1                     3      2        0    0
2                     0      1        4    2
3                     0      3        2    0

combos = combinations(pt.index, 2)
>>> [(a, b, 1 - spatial.distance.cosine(pt.ix[a].values, pt.ix[b].values)) 
     for a, b in combos]
[(1, 2, 0.12104550653376045),
 (1, 3, 0.46153846153846168),
 (2, 3, 0.66575028593568275)]

关于python - 如何将列表python计算成矩阵相似度，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/36345324/

上一篇：python - 尝试调用 Lifx API 并在一种情况下收到错误，但在另一种情况下则没有

下一篇：python - 获取 Mechanize 中 br.forms() 的键和值

python - 从字典中查找值和键并验证它们

python - 设置值多索引 Pandas

python - 如何在使用 pandas.read_csv 读取 csv 文件时将 pandas.dataframe 中的元素转换为 np.float？

python - 具有 3 个参数的 Numpy ndarray 形状

python - 在 python 中从多元 pdf 中采样

python - BeautifulSoup 经典混淆

python - 提高列表创建的性能

python - 将对数正态分布的拟合 PDF 缩放为 python 中的直方图

python - Scipy interp2d 插值屏蔽填充值