python - 使用 Python 中的斯坦福 CoreNLP 进行情感分析

标签 python stanford-nlp

我正在学习 NLP,并且刚刚安装了斯坦福 CoreNLP。我使用Windows10并安装了Python3和Anaconda3。我还安装了 pycorenlp - 0.3。

我在下载并解压文件的目录中使用以下命令运行 C​​oreNLP。

java -mx4g -cp "*" edu.stanford.nlp.pipeline.StanfordCoreNLPServer -port 9000 -timeout 15000

enter image description here

在我的 Jupyter Notebook 中,我运行了在网络上找到的以下代码:

import json, requests

class StanfordCoreNLP:

        """
        Modified from https://github.com/smilli/py-corenlp (https://github.com/smilli/py-corenlp)
        """
        def __init__(self, server_url):
            # TODO: Error handling? More checking on the url?
            if server_url[-1] == '/':
                server_url = server_url[:-1]
            self.server_url = server_url

        def annotate(self, text, properties=None):
            assert isinstance(text, str)

            if properties is None:
                properties = {}
            else:
                assert isinstance(properties, dict)

            # Checks that the Stanford CoreNLP server is started.
            try:
                requests.get(self.server_url)
            except requests.exceptions.ConnectionError:
                raise Exception('Check whether you have started the CoreNLP server e.g.\n'
                                    '$ cd <path_to_core_nlp_folder>/stanford-corenlp-full-2016-10-31/ \n'
                    '$ java -mx4g -cp "*" edu.stanford.nlp.pipeline.StanfordCoreNLPServer -port <port>' )

            data = text.encode()
            r = requests.post(
                self.server_url, params={
                        'properties': str(properties)
                }, data=data, headers={'Connection': 'close'})

            output = r.text

            if ('outputFormat' in properties
                and properties['outputFormat'] == 'json'):
                try:
                    output = json.loads(output, encoding='utf-8', strict=True)
                except:
                    pass
            return output

def sentiment_analysis_on_sentence(sentence):
            # The StanfordCoreNLP server is running on http://127.0.0.1:9000 (http://127.0.0.1:9000)
            nlp = StanfordCoreNLP('http://127.0.0.1:9000 (http://127.0.0.1:9000)')
                    # Json response of all the annotations
            output = nlp.annotate(sentence, properties={
                    "annotators": "tokenize,ssplit,parse,sentiment",
                    "outputFormat": "json",
                    # Only split the sentence at End Of Line. We assume that this method only takes in one single sentence.
                    "ssplit.eolonly": "true",
                    # Setting enforceRequirements to skip some annotators and make the process faster
                    "enforceRequirements": "false"
                    })
            # Only care about the result of the first sentence because we assume we only annotate a single sentence 

            return int(output['sentences'][0]['sentimentValue'])

但是,当我在 Jupyter Notebook 中运行时:

sentiment_analysis_on_sentence('I like the service.')

我遇到异常:

---------------------------------------------------------------------------
LocationParseError                        Traceback (most recent call last)
C:\ProgramData\Anaconda3\lib\site-packages\requests\models.py in prepare_url(self, url, params)
    370         try:
--> 371             scheme, auth, host, port, path, query, fragment = parse_url(url)
    372         except LocationParseError as e:

C:\ProgramData\Anaconda3\lib\site-packages\urllib3\util\url.py in parse_url(url)
    198             if not port.isdigit():
--> 199                 raise LocationParseError(url)
    200             try:

LocationParseError: Failed to parse: 127.0.0.1:9000 (http:

During handling of the above exception, another exception occurred:

InvalidURL                                Traceback (most recent call last)
<ipython-input-142-e4763a0324a6> in <module>()
----> 1 sentiment_analysis_on_sentence('I like the service.')

<ipython-input-141-9cf27500efe3> in sentiment_analysis_on_sentence(sentence)
     54                     "ssplit.eolonly": "true",
     55                     # Setting enforceRequirements to skip some annotators and make the process faster
---> 56                     "enforceRequirements": "false"
     57                     })
     58             # Only care about the result of the first sentence because we assume we only annotate a single sentence

<ipython-input-141-9cf27500efe3> in annotate(self, text, properties)
     22             # Checks that the Stanford CoreNLP server is started.
     23             try:
---> 24                 requests.get(self.server_url)
     25             except requests.exceptions.ConnectionError:
     26                 raise Exception('Check whether you have started the CoreNLP server e.g.\n'

C:\ProgramData\Anaconda3\lib\site-packages\requests\api.py in get(url, params, **kwargs)
     70 
     71     kwargs.setdefault('allow_redirects', True)
---> 72     return request('get', url, params=params, **kwargs)
     73 
     74 

C:\ProgramData\Anaconda3\lib\site-packages\requests\api.py in request(method, url, **kwargs)
     56     # cases, and look like a memory leak in others.
     57     with sessions.Session() as session:
---> 58         return session.request(method=method, url=url, **kwargs)
     59 
     60 

C:\ProgramData\Anaconda3\lib\site-packages\requests\sessions.py in request(self, method, url, params, data, headers, cookies, files, auth, timeout, allow_redirects, proxies, hooks, stream, verify, cert, json)
    492             hooks=hooks,
    493         )
--> 494         prep = self.prepare_request(req)
    495 
    496         proxies = proxies or {}

C:\ProgramData\Anaconda3\lib\site-packages\requests\sessions.py in prepare_request(self, request)
    435             auth=merge_setting(auth, self.auth),
    436             cookies=merged_cookies,
--> 437             hooks=merge_hooks(request.hooks, self.hooks),
    438         )
    439         return p

C:\ProgramData\Anaconda3\lib\site-packages\requests\models.py in prepare(self, method, url, headers, files, data, params, auth, cookies, hooks, json)
    303 
    304         self.prepare_method(method)
--> 305         self.prepare_url(url, params)
    306         self.prepare_headers(headers)
    307         self.prepare_cookies(cookies)

C:\ProgramData\Anaconda3\lib\site-packages\requests\models.py in prepare_url(self, url, params)
    371             scheme, auth, host, port, path, query, fragment = parse_url(url)
    372         except LocationParseError as e:
--> 373             raise InvalidURL(*e.args)
    374 
    375         if not scheme:

InvalidURL: Failed to parse: 127.0.0.1:9000 (http:

我该如何解决这个问题?

最佳答案

第 48 行

nlp = StanfordCoreNLP('http://127.0.0.1:9000 (http://127.0.0.1:9000)')

您应该删除第一个 URL 及其端口后面的这个 (http://127.0.0.1:9000)

其他步骤:

正如您在命令行日志中看到的,Stanford NLP 使用 lexparser 而不是 ShiftReduce Parser。 这很好,因为情感分析和 Shiftreduce 解析器目前存在协同工作的问题。 为了确保两者都能正常工作,您应该添加

    "parse.model": "edu/stanford/nlp/models/lexparser/englishPCFG.ser.gz"

调用属性(参见第 50 f 行)。

对于德国用户: 如果斯坦福 CoreNLP 服务器以错误的语言环境启动并且您收到无效的 JSON 文件,那么最终您可能仍然会收到错误。 然后你应该添加以下paramters到您的服务器启动:

"-Duser.language=en -Duser.country=US Default"

关于python - 使用 Python 中的斯坦福 CoreNLP 进行情感分析,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/54206651/

相关文章:

python - python "with"语句怎么写?

python - 如何在区分 None 和 False 时判断变量是 None、False 还是 True

python - 如何用Python去除图像中的小物体

python - 嵌入 Python 设计

netbeans - 斯坦福 NER : Can I use two classifiers at once in my code?

java - 显示斯坦福 NER 置信度分数

python - Django 系统检查、迁移和测试在本地通过,但在 Docker CI/CD 环境中失败

c - 这是使用 Torch 从 LuaJit 解析 'Not enough memory' 的实用方法吗

stanford-nlp - 使用 Core NLP 和 Stanford Parser 执行词性标注的结果不同?

python - NLTK 和 Stanford Dependency Parser - 如何获得单词位置?