python - Doc2vec : How to get document vectors

如何使用 Doc2vec 获取两个文本文档的文档向量？我是新手，所以如果有人能指出我正确的方向/帮助我完成一些教程会很有帮助

我正在使用 gensim。

doc1=["This is a sentence","This is another sentence"]
documents1=[doc.strip().split(" ") for doc in doc1 ]
model = doc2vec.Doc2Vec(documents1, size = 100, window = 300, min_count = 10, workers=4)

我明白了

AttributeError: 'list' object has no attribute 'words'

每当我运行它时。

最佳答案

如果你想训练 Doc2Vec 模型，你的数据集需要包含单词列表(类似于 Word2Vec 格式)和标签(文档的 ID)。它还可以包含一些附加信息(参见 https://github.com/RaRe-Technologies/gensim/blob/develop/docs/notebooks/doc2vec-IMDB.ipynb 了解更多信息)。

# Import libraries

from gensim.models import doc2vec
from collections import namedtuple

# Load data

doc1 = ["This is a sentence", "This is another sentence"]

# Transform data (you can add more data preprocessing steps) 

docs = []
analyzedDocument = namedtuple('AnalyzedDocument', 'words tags')
for i, text in enumerate(doc1):
    words = text.lower().split()
    tags = [i]
    docs.append(analyzedDocument(words, tags))

# Train model (set min_count = 1, if you want the model to work with the provided example data set)

model = doc2vec.Doc2Vec(docs, size = 100, window = 300, min_count = 1, workers = 4)

# Get the vectors

model.docvecs[0]
model.docvecs[1]

更新(如何在 epoch 中训练): 这个例子已经过时了，所以我删除了它。有关 epoch 训练的更多信息，请参阅 this answer或@gojomo 的评论。

关于python - Doc2vec : How to get document vectors，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/31321209/

python - Doc2vec : How to get document vectors

上一篇：python - 将 NumPy 数组映射到位

下一篇：python - python模块的动态加载