python - 如何根据每个句子而不是通过文件来匹配命名实体

标签 python python-3.x

我有一个文本文件,我实现了 Polyglot NER 来从此文本文件中提取实体。然后我必须对每个句子进行分段并匹配每个句子上提取的实体。匹配时它应该给我输出。

from polyglot.text import Text
file = open('input_raw.txt', 'r')
input_file = file.read()
file = Text(input_file, hint_language_code='fa')

def return_match(entities_list, sentence):       ## Check if Chunks
    for term in entities_list:                  ## are in any of the entities
        ## Check each list in each Chunk object 
        ## and see if there's any matches.
        for entity in sentence.entities:
            if entity == term:
                return entity
    return None

def return_list_of_entities(file):
    list_entity = []
    for sentence in file.sentences:
        for entity in sentence.entities:
            list_entity.append(entity)
    return list_entity

list_entity = return_list_of_entities(file)
#sentence_number = 4 # Which sentence to check
for sentence in range(len(file.sentences)):
    sentencess = file.sentences[sentence]


match = return_match(list_entity, sentencess)

if match is not None:
    print("Entity Term " + str(match) +  
          " is in the sentence. '" + str(sentencess)+ "'")
else:
    print("Sentence '" + str(sentencess) + 
          "' doesn't contain any of the terms" + str(list_entity))


输入文件:

Bill Gates is the founder of Microsoft.
Trump is the president of the USA.
Bill Gates was a student in Harvard.

当我们实现 NER 时,实体如下所示:

列表实体:

Bill Gates, Microsoft, Trump, USA, Bill Gate, Harvard

当我们将实体与第一句匹配时,它给出:

当前输出:

(Bill Gates, Bill Gates, Microsoft)

预期输出:

(Bill Gates, Microsoft) # this is from the first sentence and should contine
(Trump, USA) 
(Bill Gates, Harvard)

最佳答案

from polyglot.text import Text
import json
file = open('input_raw.txt', 'r')
input_file = file.read()
file = Text(input_file, hint_language_code='fa')

result = set()
entities_with_tag = []
def return_match(entities_list, sentence):  # Check if Chunks
    for i in range(len(sentence.entities)):
        for j in range(len(entities_list)):
            if entities_list[j] == sentence.entities[i]:
                # result.append(sentence.entities[i])
                result.add(str(sentence.entities[i]))
                entities_with_tag.append(sentence.entities[i])

def return_list_of_entities(file):
    list_entity = []
    for sentence in file.sentences:
        for entity in sentence.entities:
            list_entity.append(entity)
    return list_entity

list_entity = return_list_of_entities(file)

def return_sentence_number():
    for i in range(len(file.sentences)):
        sentence_no = file.sentences[i]
        return sentence_no

sent_no = return_sentence_number()
return_match(list_entity, sent_no)
print("Entity Term " + str(result) + " is in the sentence. '" + str(sent_no) + "'")

关于python - 如何根据每个句子而不是通过文件来匹配命名实体,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/55610352/

相关文章:

python - Python 是否有更简洁的方式来表达 "if x contains a|b|c|d..."?

python - 遍历 int 列表并用该 int 替换字符串中的值

python - Chrome webdriver selenium 内存错误 - python 3

python - 使用 kaggle api 将数据上传到 google bucket 并在 colab 中使用它

python - 在python中读取文件中存储的int值

Python 使用 re 模块解析导入的文本文件

python - 如何在 Turbogears 中创建可以从 Controller 内部调用或使用模板呈现的 Controller 方法

python - 使用 pip 安装包时出现错误 403

python - 我可以在代码中的 python 游戏中包含音乐吗?

python - str.maketrans 在交互式 python 中可用,但在 python 脚本中不可用?