python - 如何在 LDA 中查看每个主题的所有文档?

标签 python python-3.x scikit-learn lda topic-modeling

我正在使用 LDA 来了解一篇精彩文本的主题。我设法打印了主题,但我想打印包含您的主题的每个文本。

数据:

it's very hot outside summer
there are not many flowers in winter
in the winter we eat hot food
in the summer we go to the sea
in winter we used many clothes
in summer we are on vacation
winter and summer are two seasons of the year

我尝试使用 sklearn 并且可以打印主题,但我想打印属于每个主题的所有短语

from sklearn.feature_extraction.text import CountVectorizer
from sklearn.decomposition import LatentDirichletAllocation
import numpy as np
import pandas

dataset = pandas.read_csv('data.csv', encoding = 'utf-8')
comments = dataset['comments']
comments_list = comments.values.tolist()

vect = CountVectorizer()
X = vect.fit_transform(comments_list)

lda = LatentDirichletAllocation(n_topics = 2, learning_method = "batch", max_iter = 25, random_state = 0)

document_topics = lda.fit_transform(X)

sorting = np.argsort(lda.components_, axis = 1)[:, ::-1]
feature_names = np.array(vect.get_feature_names())

docs = np.argsort(comments_list[:, 1])[::-1]
for i in docs[:4]:
    print(' '.join(i) + '\n')

良好的输出:

Topic 1
it's very hot outside summer
in the summer we go to the sea
in summer we are on vacation
winter and summer are two seasons of the year

Topic 2
there are not many flowers in winter
in the winter we eat hot food
in winter we used many clothes
winter and summer are two seasons of the year

最佳答案

如果要打印文档,则需要指定它们。

print(" ".join(comments_list[i].split(",")[:2]) + "\n")

关于python - 如何在 LDA 中查看每个主题的所有文档?,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/51694637/

相关文章:

python - 检查 IP 地址是否在给定范围内

Python sklearn OneVsRestClassifier : Score function gives ValueError

python - 如何将随机森林中选定的特征转换为新列表

python - 如何从 python 轮中排除 *.pyc 和 __pycache__?

python - 属性错误 : 'int' object has no attribute 'lower' in TFIDF and CountVectorizer

python - 为什么所有对象都使用相同的坐标填充?

python - 添加既改变行为又存储参数的选项

python - 如何杀死扭曲的协议(protocol)实例python

python - 如何在 Mac 上将 PIL 安装到 Python 3.5?

python-3.x - Python Selenium,在span中查找文本