python - 基于 Spacy 规则的匹配来识别 python 中与金钱/日期相关的单词

标签 python nlp spacy

我是 spacy 新手。我尝试找出与金钱或日期有关的单词。

import spacy
from spacy import displacy

nlp = spacy.load("en_core_web_sm")

doc=nlp("""The client is look at 5x income. He has a loan with $5000 outstanding which can be repaid now this will free up $50 monthly. Credit card outstanding of $60 client will look to pay this off with bonus in September.""")

displacy.render(doc,style="ent",jupyter=True)
displacy.render(doc,style="dep",jupyter=True)

enter image description here 根据依赖项输出(此处未显示),我尝试搜索表示金钱和日期的单词(例如,$60 -> 信用卡未偿还)。在阅读了大量教程(包括 spacy)和博客之后,我认为我应该使用基于依赖规则的匹配。然而,我似乎需要用特定的结构来指定模式中的数字(金钱)(例如,$10000 的模式结构)。我们可以为任何货币实体创建一个模式吗?

另外,为了构建模型,有人可以帮我构建一个 60 美元和 5000 美元的模型吗?谢谢

最佳答案

你可以使用这样的东西:

import spacy
nlp = spacy.load("en_core_web_sm")

# Merge noun phrases and entities for easier analysis
nlp.add_pipe("merge_entities")
nlp.add_pipe("merge_noun_chunks")

doc=nlp("""The client is look at 5x income. He has a loan with $5000 outstanding which can be repaid now this will free up $50 monthly. Credit card outstanding of $60 client will look to pay this off with bonus in September.""")

for token in doc:
    if token.ent_type_ == "MONEY":
        # We have an attribute and direct object, so check for subject
        if token.dep_ in ("attr", "dobj"):
            subj = [w for w in token.head.lefts if w.dep_ == "nsubj" or w.dep_ == "amod"]
            if subj:
                print(subj[0], "-->", token)
        # We have a prepositional object with a preposition
        elif token.dep_ == "pobj" and token.head.dep_ == "prep":
            print(token.head.head, "-->", token)

输出:

a loan --> 5000
this --> 50
Credit card --> 60

关于python - 基于 Spacy 规则的匹配来识别 python 中与金钱/日期相关的单词,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/66320574/

相关文章:

python - git 存储库数据结构是否使用规范编码?

c# - NLP:提取形状名称和形状尺寸

nlp - 从依赖树中提取(主语、谓语、宾语)

python - 如何使用 spacy nlp 查找专有名词

python - 可以在管道处理期间从 spaCy 文档中删除 token 吗?

python - urllib2 和 BeautifulSoup 没有提取完整的网页

python - 二分查找元素的最低索引

python - 为什么我的 N Queens 算法到达最后一行?

haskell - 如何从 Haskell 中的文本 block 中提取关键字

python - 如何将 Spacy en_core_web_md 模型放入 Python 包中