python-2.7 - NLTK - 如何使用 NER

标签 python-2.7 nltk

如何从 NLTK 调用 NER 来获取所有 TXT 文件的前 200 个字符的所有结果位于同一目录中?

<小时/>

当我尝试这段代码时:

for filename in os.listdir(ebooksFolder):
    fname, fextension = os.path.splitext(filename)
        if (fextension == '.txt'):
            newName = 'ner_' + filename
            file = open(ebooksFolder + '\\' + filename)
            rawFile = file.read()
            partToUse = rawFile[:50]
            segmentedSentences = nltk.sent_tokenize(partToUse)
            tokenizedSentences = [nltk.word_tokenize(sent) for sent in segmentedSentences]
            posTaggedSentences = [nltk.pos_tag(sent) for sent in tokenizedSentences]
            nerResult = nltk.ne_chunk(posTaggedSentences)
            pathToCopy = 'C:\\Users\\Felipe\\Desktop\\books_txt\\'
            nameToSave = os.path.join(pathToCopy, newName + '.txt')
            newFile = open(nameToSave, 'w')
            newFile.write(nerResult)
            newFile.close()

我收到这些错误:

Traceback (most recent call last):
  File "<pyshell#77>", line 11, in <module>
    nerResult = nltk.ne_chunk(posTaggedSentences)
  File "C:\Python27\lib\site-packages\nltk\chunk\__init__.py", line 177, in ne_chunk
    return chunker.parse(tagged_tokens)
  File "C:\Python27\lib\site-packages\nltk\chunk\named_entity.py", line 116, in parse
    tagged = self._tagger.tag(tokens)
  File "C:\Python27\lib\site-packages\nltk\tag\sequential.py", line 58, in tag
    tags.append(self.tag_one(tokens, i, tags))
  File "C:\Python27\lib\site-packages\nltk\tag\sequential.py", line 78, in tag_one
    tag = tagger.choose_tag(tokens, index, history)
  File "C:\Python27\lib\site-packages\nltk\tag\sequential.py", line 554, in choose_tag
    featureset = self.feature_detector(tokens, index, history)
  File "C:\Python27\lib\site-packages\nltk\tag\sequential.py", line 605, in feature_detector
    return self._feature_detector(tokens, index, history)
  File "C:\Python27\lib\site-packages\nltk\chunk\named_entity.py", line 49, in _feature_detector
    pos = simplify_pos(tokens[index][1])
  File "C:\Python27\lib\site-packages\nltk\chunk\named_entity.py", line 178, in simplify_pos
    if s.startswith('V'): return "V"
AttributeError: 'tuple' object has no attribute 'startswith'

最佳答案

将文本标记为句子,然后标记为 POS 标记,您需要迭代标记句子的列表,如下所示:

nerResult = [nltk.ne_chunk(pts) for pts in posTaggedSentences]

而不是像这样:

nerResult = nltk.ne_chunk(posTaggedSentences)

关于python-2.7 - NLTK - 如何使用 NER,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/23945364/

相关文章:

python - 用另一个数据帧的值填充数据帧列

python - Tkinter 按钮不会随 Canvas 滚动

php - 如何编写 PHP 按钮来从数据库中选择行并运行 python 代码

python - 优化 Gensim word mover 的速度距离函数 (wmdistance)

python - 在 pandas DataFrame 列中存储列表

python - 句子含义相似度和频率

python - 切片时排除内部元素

mysql - Pandas 按日期合并,如列不起作用

python - Nltk json数据加载错误

python - 调用 process.extract 时出现 TypeError : ('expected string or bytes-like object' , 'occurred at index 0' )