java - 使用 Stanford CoreNLP 情绪分析时输出不正确

标签 java stanford-nlp sentiment-analysis

当我输入句子时:

"So excited to be back! We're here to reconnect with & meet new innovators at ghc16"

那么返回的情绪是负面的。无法理解发生这种情况的原因。语句为正,但仍返回负值。

    class SentimentAnalyzer {

        public TweetWithSentiment findSentiment(String line) {

        if(line == null || line.isEmpty()) {
          throw new IllegalArgumentException("The line must not be null or empty.");
        }

        Annotation annotation = processLine(line);

        int mainSentiment = findMainSentiment(annotation);

        if(mainSentiment < 0 || mainSentiment > 4) { //You should avoid magic numbers like 2 or 4 try to create a constant that will provide a description why 2
           return null; //You should avoid null returns 
        }

        TweetWithSentiment tweetWithSentiment = new TweetWithSentiment(line, toCss(mainSentiment));
        return tweetWithSentiment;

    }

    private String toCss(int sentiment) {
        switch (sentiment) {
        case 0:
            return "very negative";
        case 1:
            return "negative";
        case 2:
            return "neutral";
        case 3:
            return "positive";
        case 4:
            return "very positive";
        default:
            return "default";
        }

     }


     private int findMainSentiment(Annotation annotation) {

        int mainSentiment = Integer.MIN_VALUE;
        int longest = Integer.MIN_VALUE;


        for (CoreMap sentence : annotation.get(CoreAnnotations.SentencesAnnotation.class)) {

            for (CoreLabel token : sentence.get(CoreAnnotations.TokensAnnotation.class)) {

                String word = token.get(CoreAnnotations.TextAnnotation.class);
                String pos = token.get(CoreAnnotations.PartOfSpeechAnnotation.class);
                String ne = token.get(CoreAnnotations.NamedEntityTagAnnotation.class);
                String lemma = token.get(CoreAnnotations.LemmaAnnotation.class);

                System.out.println("word: " + word);
                System.out.println("pos: " + pos);
                System.out.println("ne: " + ne);
                System.out.println("Lemmas: " + lemma);

            }      

           int sentenceLength = String.valueOf(sentence).length();

           if(sentenceLength > longest) {

             Tree tree = sentence.get(SentimentCoreAnnotations.SentimentAnnotatedTree.class);

             mainSentiment = RNNCoreAnnotations.getPredictedClass(tree);

             longest = sentenceLength ;

            }
        }

        return mainSentiment;

     }


     private Annotation processLine(String line) {

        StanfordCoreNLP pipeline = createPieline();

        return pipeline.process(line);

     }

     private StanfordCoreNLP createPieline() {

        Properties props = createPipelineProperties();

        StanfordCoreNLP pipeline = new StanfordCoreNLP(props);

        return pipeline;

     }

     private Properties createPipelineProperties() {

        Properties props = new Properties();
        props.setProperty("annotators", "tokenize, ssplit, pos, lemma, ner, parse, sentiment");

        return props;

     }


 }

最佳答案

这是又一个技术限制案例regarding nlp lib itself ,主要针对一些具体点:

  1. Ambiguous sentiment words - "This product works terribly" vs. "This product is terribly good"

  2. Missed negations - "I would never in a millions years say that this product is worth buying"

  3. Quoted/Indirect text - "My dad says this product is terrible, but I disagree"

  4. Comparisons - "This product is about as useful as a hole in the head"

  5. Anything subtle - "This product is ugly, slow and uninspiring, but it's the only thing on the market that does the job"

在您的示例中,算法没有任何问题。让我们分别分析文本的某些部分:

  • 很高兴能回来! -> 积极
  • We're here to reconnect with -> 中立
  • 在 ghc16 上认识新的创新者 -> 中立

在一个简单的平均值中,我们会得到介于中性 之间的值。然而,正如我们所见,该算法是不可预测的,这就是为什么如果您在文本中添加一个单词(& 也没有得到很好的解释):

So excited to be back! We're here to reconnect with you and meet new innovators at ghc16

...结果将返回中性


建议:

  1. 不要将情绪 1 视为消极的东西,一旦您将面临这样的情况;
  2. 在可以控制的情况下,尽量使文字正确简洁,以获得更好的效果;
  3. 尽可能多地划分句子,并为每个句子分别运行算法。然后,根据您自己的测试用例进行自定义平均。

如果它们都不适合,请考虑切换到另一个 Machine-learning technique .

关于java - 使用 Stanford CoreNLP 情绪分析时输出不正确,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/41378527/

相关文章:

java - android中类似于iOS的VOIP推送和调用套件的调用

java - 斯坦福 CoreNLP、Spring Web 服务、Netbeans 8.0.1 和内存不足

deep-learning - 如何为情感分析构建和标记非英文数据集

sentiment-analysis - SentiWordNet 中的意义数是什么意思?

python - 这个带有正则表达式的 python 代码成功删除了 URL,但如果在推文开头找到 URL,则所有句子也将被删除

java - WhatsApp 如何检测联系人列表中的谁使用了该应用程序?

java - 如何获取两个 java.util.Date 之间的年数?

java - 编译 java 文件时遇到问题(ANT 和 XML 相关)

nlp - 除了 RegEx 之外的其他技术在句子中发现 'intent'

stanford-nlp - 如何在 python 中从 CoreNLP 服务器返回的字符串中获取解析树?