nlp - 关于 lda 推理

标签 nlp topic-modeling mallet

现在，我正在使用 MALLET 包中的 LDA 主题建模工具对我的文档进行一些主题检测。最初一切都很好，我从中得到了 20 个主题。但是，当我尝试使用该模型推断新文档时，结果有点莫名其妙。

例如，我故意在我手动创建的文档上运行我的模型，该文档只包含来自“FLU”主题之一的关键字，但我得到的主题分布对于每个主题都 <0.1。然后，我在已经采样的文档之一上尝试了同样的事情，其中一个主题的得分为 0.7。同样的事情又发生了。

有人可以提供一些有关原因的线索吗？

尝试在 MALLET 邮件列表上询问，但显然没有人回复。

最佳答案

我也对 MALLET 知之甚少，但文档提到了这一点......

Topic Inference

--inferencer-filename [FILENAME] Create a topic inference tool based on the current, trained model. Use the MALLET command bin/mallet infer-topics --help to get information on using topic inference.

Note that you must make sure that the new data is compatible with your training data. Use the option --use-pipe-from [MALLET TRAINING FILE] in the MALLET command bin/mallet import-file or import-dir to specify a training file.

也许你忘了这样做？在我看来，您正在训练的数据与您正在测试的数据的格式不同。

关于nlp - 关于 lda 推理，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/4374296/

上一篇：win-universal-app - Windows 10 的应用程序内存限制是多少？

下一篇：google-maps-api-3 - 使用 Google Maps Distance Matrix API 时是否可以通过多种出行方式

machine-learning - bin/mallet 训练主题在每个实例中都会得到不同的结果

machine-learning - Mallet 训练模型负载

python - 如何执行 ngram 到 ngram 关联

nlp - 在发送到 RNN 之前，token <pad>, <unknown>, <go>, <EOS> 的词向量应该是什么？

nlp - word2vec 中的命令行参数

python - Python 中的 LDA，我得到的是字符而不是主题

python - 非负矩阵分解 - IndexError : index 4 is out of bounds for axis 1 with size 4

java - 木槌主题建模

scala - 管道能成为变压器吗？