当我输入句子时:
"So excited to be back! We're here to reconnect with & meet new innovators at ghc16"
那么返回的情绪是负面的。无法理解发生这种情况的原因。语句为正,但仍返回负值。
class SentimentAnalyzer {
public TweetWithSentiment findSentiment(String line) {
if(line == null || line.isEmpty()) {
throw new IllegalArgumentException("The line must not be null or empty.");
}
Annotation annotation = processLine(line);
int mainSentiment = findMainSentiment(annotation);
if(mainSentiment < 0 || mainSentiment > 4) { //You should avoid magic numbers like 2 or 4 try to create a constant that will provide a description why 2
return null; //You should avoid null returns
}
TweetWithSentiment tweetWithSentiment = new TweetWithSentiment(line, toCss(mainSentiment));
return tweetWithSentiment;
}
private String toCss(int sentiment) {
switch (sentiment) {
case 0:
return "very negative";
case 1:
return "negative";
case 2:
return "neutral";
case 3:
return "positive";
case 4:
return "very positive";
default:
return "default";
}
}
private int findMainSentiment(Annotation annotation) {
int mainSentiment = Integer.MIN_VALUE;
int longest = Integer.MIN_VALUE;
for (CoreMap sentence : annotation.get(CoreAnnotations.SentencesAnnotation.class)) {
for (CoreLabel token : sentence.get(CoreAnnotations.TokensAnnotation.class)) {
String word = token.get(CoreAnnotations.TextAnnotation.class);
String pos = token.get(CoreAnnotations.PartOfSpeechAnnotation.class);
String ne = token.get(CoreAnnotations.NamedEntityTagAnnotation.class);
String lemma = token.get(CoreAnnotations.LemmaAnnotation.class);
System.out.println("word: " + word);
System.out.println("pos: " + pos);
System.out.println("ne: " + ne);
System.out.println("Lemmas: " + lemma);
}
int sentenceLength = String.valueOf(sentence).length();
if(sentenceLength > longest) {
Tree tree = sentence.get(SentimentCoreAnnotations.SentimentAnnotatedTree.class);
mainSentiment = RNNCoreAnnotations.getPredictedClass(tree);
longest = sentenceLength ;
}
}
return mainSentiment;
}
private Annotation processLine(String line) {
StanfordCoreNLP pipeline = createPieline();
return pipeline.process(line);
}
private StanfordCoreNLP createPieline() {
Properties props = createPipelineProperties();
StanfordCoreNLP pipeline = new StanfordCoreNLP(props);
return pipeline;
}
private Properties createPipelineProperties() {
Properties props = new Properties();
props.setProperty("annotators", "tokenize, ssplit, pos, lemma, ner, parse, sentiment");
return props;
}
}
最佳答案
这是又一个技术限制案例regarding nlp lib itself ,主要针对一些具体点:
Ambiguous sentiment words - "This product works terribly" vs. "This product is terribly good"
Missed negations - "I would never in a millions years say that this product is worth buying"
Quoted/Indirect text - "My dad says this product is terrible, but I disagree"
Comparisons - "This product is about as useful as a hole in the head"
Anything subtle - "This product is ugly, slow and uninspiring, but it's the only thing on the market that does the job"
在您的示例中,算法没有任何问题。让我们分别分析文本的某些部分:
很高兴能回来!
-> 积极We're here to reconnect with
-> 中立在 ghc16 上认识新的创新者
-> 中立
在一个简单的平均值中,我们会得到介于中性 和正 之间的值。然而,正如我们所见,该算法是不可预测的,这就是为什么如果您在文本中添加一个单词(& 也没有得到很好的解释):
So excited to be back! We're here to reconnect with you and meet new innovators at ghc16
...结果将返回中性。
建议:
- 不要将
情绪
1
视为消极的东西,一旦您将面临这样的情况; - 在可以控制的情况下,尽量使文字正确简洁,以获得更好的效果;
- 尽可能多地划分句子,并为每个句子分别运行算法。然后,根据您自己的测试用例进行自定义平均。
如果它们都不适合,请考虑切换到另一个 Machine-learning technique .
关于java - 使用 Stanford CoreNLP 情绪分析时输出不正确,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/41378527/