python - 基于 Spacy 规则的匹配来识别 python 中与金钱/日期相关的单词

我是 spacy 新手。我尝试找出与金钱或日期有关的单词。

import spacy
from spacy import displacy

nlp = spacy.load("en_core_web_sm")

doc=nlp("""The client is look at 5x income. He has a loan with $5000 outstanding which can be repaid now this will free up $50 monthly. Credit card outstanding of $60 client will look to pay this off with bonus in September.""")

displacy.render(doc,style="ent",jupyter=True)
displacy.render(doc,style="dep",jupyter=True)

根据依赖项输出(此处未显示)，我尝试搜索表示金钱和日期的单词(例如，$60 -> 信用卡未偿还)。在阅读了大量教程(包括 spacy)和博客之后，我认为我应该使用基于依赖规则的匹配。然而，我似乎需要用特定的结构来指定模式中的数字(金钱)(例如，$10000 的模式结构)。我们可以为任何货币实体创建一个模式吗？

另外，为了构建模型，有人可以帮我构建一个 60 美元和 5000 美元的模型吗？谢谢

最佳答案

你可以使用这样的东西:

import spacy
nlp = spacy.load("en_core_web_sm")

# Merge noun phrases and entities for easier analysis
nlp.add_pipe("merge_entities")
nlp.add_pipe("merge_noun_chunks")

doc=nlp("""The client is look at 5x income. He has a loan with $5000 outstanding which can be repaid now this will free up $50 monthly. Credit card outstanding of $60 client will look to pay this off with bonus in September.""")

for token in doc:
    if token.ent_type_ == "MONEY":
        # We have an attribute and direct object, so check for subject
        if token.dep_ in ("attr", "dobj"):
            subj = [w for w in token.head.lefts if w.dep_ == "nsubj" or w.dep_ == "amod"]
            if subj:
                print(subj[0], "-->", token)
        # We have a prepositional object with a preposition
        elif token.dep_ == "pobj" and token.head.dep_ == "prep":
            print(token.head.head, "-->", token)

输出:

a loan --> 5000
this --> 50
Credit card --> 60

关于python - 基于 Spacy 规则的匹配来识别 python 中与金钱/日期相关的单词，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/66320574/

python - 基于 Spacy 规则的匹配来识别 python 中与金钱/日期相关的单词

上一篇：apache-spark - 可以使用 spark 配置来配置 Beam 可移植运行机吗？

下一篇：python - 无法使用 Beautifulsoup 抓取日期/时间信息