python - 将 tf-idf 值添加为矩阵中的列

标签 python pandas scikit-learn tf-idf

from sklearn.feature_extraction.text import TfidfVectorizer

item = list(df['item1']) + list(df['item2'])
tfidf = TfidfVectorizer()
tfidf_sp = tfidf.fit_transform(item)

for i in len(list(df['item1'])):
    new_list =[]
    new_list.append(tfidf.idf_)
df['updated_item'] = list(new_list)

我试图将 tfidf 分数添加为特征。这是正确的方法吗？

item1 的形状为 (400k)，item2 的形状也相同。 tfidf_sp的形状为(800k, 100k)。

最佳答案

import pandas as pd

pd.DataFrame(tfidf_sp, columns = tfidf.get_feature_names())

这将为您提供一个矩阵，其中的列作为 tfidf 词汇表，每行包含与每个项目对应的 tfidf 值。

希望这有帮助。

编辑:

尝试将获得的术语文档矩阵转换为数组，如下所示:

tfidf_sp = tfidf.fit_transform(item).toarray()

这将解决 Pandas 错误。

关于python - 将 tf-idf 值添加为矩阵中的列，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/50985135/

上一篇：python - pandas如何在写入数据框之前向csv添加详细信息

下一篇：python - 请求 POST 方法中的位置

相关文章：

machine-learning - 处理 SVM 中缺失的属性

python - Python 中大 `n` 的快速素性测试

python - f1 分数总是 ~0.75？

python - 键错误 : None of [Int64Index. ..] dtype='int64] 在列中

python - 如何使用预测模型 python 预测列中的特定行？

python - 卡住 Python Pandas 的问题

python - 将 CSV 转换为 JSON。如何保持具有相同索引的值？

python - 在给定焦点的情况下在 matplotlib 中绘制椭圆

python - 带有 Python 的 Mysql 无法插入具有自动增量 ID 的记录，为什么？

python - 计时代码执行时间