python - Truecasing - SpaCy

标签 python nltk spacy

目的是根据 POS 标签进行大写,我可以通过以下链接来实现。

How can I best determine the correct capitalization for a word?

尝试使用 spacy 获得类似的结果?

def truecase(doc):
    truecased_sents = [] # list of truecased sentences
    tagged_sent = token.tag_([word.lower() for token in doc])
    normalized_sent = [w.capitalize() if t in ["NN","NNS"] else w for (w,t) in tagged_sent]
    normalized_sent[0] = normalized_sent[0].capitalize()
    string = re.sub(" (?=[\.,'!?:;])", "", ' '.join(normalized_sent))
    return string

它抛出这个错误

  tagged_sent = token.tag_([word.lower() for token in doc])
NameError: global name 'token' is not defined

如何将 token 声明为全局 token 并解决此问题。我的方法正确吗?

最佳答案

import spacy, re
nlp = spacy.load('en_core_web_sm')
doc = nlp(u'autonomous cars shift insurance liability toward manufacturers.')
tagged_sent = [(w.text, w.tag_) for w in doc]
normalized_sent = [w.capitalize() if t in ["NN","NNS"] else w for (w,t) in tagged_sent]
normalized_sent[0] = normalized_sent[0].capitalize()
string = re.sub(" (?=[\.,'!?:;])", "", ' '.join(normalized_sent))
print string

输出: 自动驾驶汽车将保险责任转移给制造商。

关于python - Truecasing - SpaCy,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/48030217/

相关文章:

python - Ubuntu 14.04 : python can't import module pydot

python - 在 nltk 中打断/分解复杂和复合句子

python - 如何使用 Django 连接

iphone - NSURLRequest POST 到谷歌应用引擎?

machine-learning - 如何处理这个命名实体分类任务?

python - 从文本中提取主题关键字

python - Spacy ValueError : [E103] Trying to set conflicting doc. ents

python - SpaCy 的匹配器正则表达式不匹配字符串

python - 将 Spacy 文档的一部分提取为新文档

python - TensorBoard --logdir ="path/to/log"(无法分配给运算符(operator))