nlp - 如何使用斯坦福情感分析数据集

标签 nlp stanford-nlp deep-learning sentiment-analysis recurrent-neural-network

我正在尝试使用斯坦福情绪分析数据集进行一些情绪分析研究。我下载数据集 enter link description here 来自 http://nlp.stanford.edu/sentiment/index.html .看完readme文件，还是有些迷茫。

第一个问题，在dictionary.txt文件的“50446”行，显示这句话的“phrase ids”是“No.226166”，那么当我在 sentiment_lable.txt 文件中搜索，我在“226168”行中找到短语“No.226166”的“情感值”是0.69444.但是在dictionary.txt文件的“50445”行中，这句话等同于“50446”行中的句子。但是这句话在sentiment_lable.txt文件中有不同的“情感值”，为什么？!!!

第二个问题，在一些情感分析论文中，他们不仅使用训练句子中的全长句子来训练模型，还使用作为训练句子的子部分出现的标记短语来训练模型.但是我在 dictionary.txt 文件中发现了一些无用的短语，例如第 2 行和第 3 行，我应该使用这些无用的短语来训练我的模型吗？

最佳答案

dictionary.txt文件格式为

<Phrase>|<ID>

sentiment_labels.txt格式为

<Phrase ID>|<Score>

比如

id: 50445 phrase: control of both his medium and his message score: .777 id: 50446 phrase: controlled display of murderous vulnerability ensures that malice has a very human face score: .444

关于nlp - 如何使用斯坦福情感分析数据集，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/37305686/

上一篇：php - 为什么我无法在 Laravel 5.1 中上传到 public_html？

下一篇：c# - 将测试类拆分为部分类？

相关文章：

java - 斯坦福神经网络依赖解析器 : Unrecoverable error while loading a tagger model

java - 如何使用 stanford 获取复合词的 POS 标签

java - 斯坦福 POS 标记 : get words tagged as singular nouns(NN)

python - Tensorflow 对象检测 : how to detect on batch

python - 从 S3 存储桶导入 AWS Lambda 函数代码中的库

nlp - 在斯坦福主题建模工具包 (TMT) 中，主题的数字是什么意思(在输出文件 "summary.txt"中)？

python - 从单词中删除重复字符

python - 本地加载StanfordNLP模型

java - 在斯坦福解析器中查找名词和动词

machine-learning - "UserWarning: An input could not be retrieved. It could be because a worker has died. We do not have any information on the lost sample."