python - 在 python 中找到与目标短语相关的周围 ADJ 的任何有效方法？

我正在对给定的文档进行情感分析，我的目标是找出句子中与目标短语最接近或周围的形容词词。我确实知道如何提取与目标短语相关的周围词，但是如何找出相对接近或最接近的形容词或 NNP 或 VBN 或其他 POS 标签目标短语。

这是我如何让周围的词与我的目标短语相关的粗略想法。

sentence_List= {"Obviously one of the most important features of any computer is the human interface.", "Good for everyday computing and web browsing.",
"My problem was with DELL Customer Service", "I play a lot of casual games online[comma] and the touchpad is very responsive"}

target_phraseList={"human interface","everyday computing","DELL Customer Service","touchpad"}

请注意，我的原始数据集是作为数据框给出的，其中给出了句子列表和相应的目标短语。这里我只是模拟数据如下:

import pandas as pd
df=pd.Series(sentence_List, target_phraseList)
df=pd.DataFrame(df)

这里我将句子分词如下:

from nltk.tokenize import word_tokenize
tokenized_sents = [word_tokenize(i) for i in sentence_List]
tokenized=[i for i in tokenized_sents]

然后我尝试使用这个 loot at here 找出与我的目标短语相关的周围词.但是，我想找出与我的目标短语相关的相对更近或更近的 adjective，或 verbs 或 VBN。我怎样才能做到这一点？有什么想法可以完成吗？谢谢

最佳答案

像下面这样的东西对你有用吗？我认识到需要进行一些调整才能使其完全有用(检查大写/小写；如果有领带，它还会返回句子中前面的单词而不是后面的单词)但希望它有用足以让你开始:

import nltk
from nltk.tokenize import MWETokenizer

def smart_tokenizer(sentence, target_phrase):
    """
    Tokenize a sentence using a full target phrase.
    """
    tokenizer = MWETokenizer()
    target_tuple = tuple(target_phrase.split())
    tokenizer.add_mwe(target_tuple)
    token_sentence = nltk.pos_tag(tokenizer.tokenize(sentence.split()))

    # The MWETokenizer puts underscores to replace spaces, for some reason
    # So just identify what the phrase has been converted to
    temp_phrase = target_phrase.replace(' ', '_')
    target_index = [i for i, y in enumerate(token_sentence) if y[0] == temp_phrase]
    if len(target_index) == 0:
        return None, None
    else:
        return token_sentence, target_index[0]


def search(text_tag, tokenized_sentence, target_index):
    """
    Search for a part of speech (POS) nearest a target phrase of interest.
    """
    for i, entry in enumerate(tokenized_sentence):
        # entry[0] is the word; entry[1] is the POS
        ahead = target_index + i
        behind = target_index - i
        try:
            if (tokenized_sentence[ahead][1]) == text_tag:
                return tokenized_sentence[ahead][0]
        except IndexError:
            try:
                if (tokenized_sentence[behind][1]) == text_tag:
                    return tokenized_sentence[behind][0]
            except IndexError:
                continue

x, i = smart_tokenizer(sentence='My problem was with DELL Customer Service',
                       target_phrase='DELL Customer Service')
print(search('NN', x, i))

y, j = smart_tokenizer(sentence="Good for everyday computing and web browsing.",
                       target_phrase="everyday computing")
print(search('NN', y, j))

编辑:我做了一些更改以解决使用任意长度目标短语的问题，如您在 smart_tokenizer 函数中所见。那里的关键是 nltk.tokenize.MWETokenizer 类(有关更多信息，请参阅:Python: Tokenizing with phrases)。希望这会有所帮助。顺便说一句，我会质疑 spaCy 必然更优雅的想法 - 在某些时候，必须有人编写代码才能完成工作。这要么是 spaCy 开发人员，要么是您推出自己的解决方案。他们的 API 相当复杂，所以我会把这个练习留给你。

关于python - 在 python 中找到与目标短语相关的周围 ADJ 的任何有效方法？，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/53327804/

python - 在 python 中找到与目标短语相关的周围 ADJ 的任何有效方法？

上一篇：python - PycURL 在 Python 3.7.0 (Windows 10) 上的安装

下一篇：python - VSCode python 调试 : "No module named xx" when using module attribute