python - 值错误: [E024] Could not find an optimal move to supervise the parser

标签 python python-3.x nlp spacy named-entity-recognition

我在使用自定义训练数据训练 spacy NER 模型时收到以下错误。

ValueError: [E024] Could not find an optimal move to supervise the parser. Usually, this means the GoldParse was not correct. For example, are all labels added to the model?

谁能帮我解决这个问题吗?

最佳答案

通过下面的这个函数传递训练数据工作正常,没有任何错误。

def trim_entity_spans(data: list) -> list:
    """Removes leading and trailing white spaces from entity spans.

    Args:
        data (list): The data to be cleaned in spaCy JSON format.

    Returns:
        list: The cleaned data.
    """
    invalid_span_tokens = re.compile(r'\s')

    cleaned_data = []
    for text, annotations in data:
        entities = annotations['entities']
        valid_entities = []
        for start, end, label in entities:
            valid_start = start
            valid_end = end
            while valid_start < len(text) and invalid_span_tokens.match(
                    text[valid_start]):
                valid_start += 1
            while valid_end > 1 and invalid_span_tokens.match(
                    text[valid_end - 1]):
                valid_end -= 1
            valid_entities.append([valid_start, valid_end, label])
        cleaned_data.append([text, {'entities': valid_entities}])

    return cleaned_data

关于python - 值错误: [E024] Could not find an optimal move to supervise the parser,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/56642816/

相关文章:

python - 尝试安装 python asyncio 时失败

python - 数据表 python flask

python - 使用 matplotlib python 绘制决策树分类器的 2 个以上特征

performance - Python,迭代正则表达式但在第一次匹配时停止的最快方法

python-3.x - 奇怪的 Pandas 日期切片行为(不切片日期)

python-2.7 - 如何使用斯坦福NER(命名实体识别器)的python接口(interface)?

python - Pandas 数据框切片

python - 如何优化python代码(运行时间应小于10s)?

python - 索引聊天日志并在 Django 中搜索它们

python - 负采样中排除正样本