以下代码片段(根据 spaCy 示例代码修改)生成一个我无法弄清楚的 KeyError:
import en_core_web_sm
from spacy.gold import GoldParse
nlp = en_core_web_sm.load()
nlp.entity.add_label('ACCT')
TRAIN_DATA = [
("Exxon opened a new processing facility", {
"entities": [(0, 5, "ACCT")]
}),
("another example sentence", {
"entities": []
}),
("Shell is an oil company, and so is Chevron.", {
"entities": [(0, 5, "ACCT"), (35, 42, "ACCT")]
}),
("Texaco?", {
"entities": [(0, 6, "ACCT")]
})
]
# Add new words to vocab
for raw_text, _ in TRAIN_DATA:
doc = nlp.make_doc(raw_text)
for word in doc:
_ = nlp.vocab[word.orth]
loss = 0.
for raw_text, entity_offsets in TRAIN_DATA:
doc = nlp.make_doc(raw_text)
gold = GoldParse(doc, entities=entity_offsets)
loss += nlp.entity.update(doc, gold, drop=0.9)
错误是:
KeyError Traceback (most recent call last)
<ipython-input-27-bbf3e1dc4d39> in <module>()
33 for raw_text, entity_offsets in TRAIN_DATA:
34 doc = nlp.make_doc(raw_text)
---> 35 gold = GoldParse(doc, entities=entity_offsets)
36 loss += nlp.entity.update(doc, gold, drop=0.9)
37
gold.pyx in spacy.gold.GoldParse.__init__()
KeyError: 0
我在 spaCy 2.0.3 和 spaCy 1.9 中看到此错误。
当我在 Flask 应用程序中运行类似的代码时,我得到了额外的跟踪信息,表明失败的实际行是 elif not isinstance(entities[0], basestring):
在 gold.pyx
文件。
谁能帮忙解释一下发生了什么吗?
最佳答案
我不知道spaCy如何sample code曾经有效,但 GoldParse
方法希望 entities
是一个 list
,而不是 dict
。将行更改为:
gold = GoldParse(doc, entities=entity_offsets.get('entities'))
解决了问题。
关于python - spaCy GoldParse 中的 KeyError,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/47807189/