我是 python 和 nltk 的新手。我已经从 https://gist.github.com/alexbowe/879414 转换了代码到下面给定的代码以使其运行许多文档/文本 block 。但是我得到了以下错误
Traceback (most recent call last):
File "E:/NLP/PythonProgrames/NPExtractor/AdvanceMain.py", line 16, in <module>
result = np_extractor.extract()
File "E:\NLP\PythonProgrames\NPExtractor\NPExtractorAdvanced.py", line 67, in extract
for term in terms:
File "E:\NLP\PythonProgrames\NPExtractor\NPExtractorAdvanced.py", line 60, in get_terms
for leaf in self.leaves(tree):
TypeError: leaves() takes 1 positional argument but 2 were given
谁能帮我解决这个问题。我必须从数百万条产品评论中提取名词短语。我使用了使用 Java 的 Standford NLP 工具包,但它非常慢,所以我认为在 python 中使用 nltk 会更好。如果有更好的解决方案也请推荐。
import nltk
from nltk.corpus import stopwords
stopwords = stopwords.words('english')
grammar = r"""
NBAR:
{<NN.*|JJ>*<NN.*>} # Nouns and Adjectives, terminated with Nouns
NP:
{<NBAR>}
{<NBAR><IN><NBAR>} # Above, connected with in/of/etc...
"""
lemmatizer = nltk.WordNetLemmatizer()
stemmer = nltk.stem.porter.PorterStemmer()
class NounPhraseExtractor(object):
def __init__(self, sentence):
self.sentence = sentence
def execute(self):
# Taken from Su Nam Kim Paper...
chunker = nltk.RegexpParser(grammar)
#toks = nltk.regexp_tokenize(text, sentence_re)
# #postoks = nltk.tag.pos_tag(toks)
toks = nltk.word_tokenize(self.sentence)
postoks = nltk.tag.pos_tag(toks)
tree = chunker.parse(postoks)
return tree
def leaves(tree):
"""Finds NP (nounphrase) leaf nodes of a chunk tree."""
for subtree in tree.subtrees(filter=lambda t: t.label() == 'NP'):
yield subtree.leaves()
def normalise(word):
"""Normalises words to lowercase and stems and lemmatizes it."""
word = word.lower()
word = stemmer.stem_word(word)
word = lemmatizer.lemmatize(word)
return word
def acceptable_word(word):
"""Checks conditions for acceptable word: length, stopword."""
accepted = bool(2 <= len(word) <= 40
and word.lower() not in stopwords)
return accepted
def get_terms(self,tree):
for leaf in self.leaves(tree):
term = [self.normalise(w) for w, t in leaf if self.acceptable_word(w)]
yield term
def extract(self):
terms = self.get_terms(self.execute())
matches = []
for term in terms:
for word in term:
matches.append(word)
return matches
最佳答案
您需要:
- 用@staticmethod 装饰每个
normalize
、acceptable_word
和leaves
,或者 - 添加一个
self
参数作为这些方法的第一个参数。
你正在调用 self.leaves
,它会将 self
作为隐式第一个参数传递给 leaves
方法(但你的方法只需要一个参数)。创建这些静态方法,或添加一个 self
参数将解决此问题。
(您稍后调用 self.acceptable_word
和 self.normalize
将遇到相同的问题)
您可以在 docs 中阅读有关 Python 静态方法的内容,或者可能来自 external site 这可能更容易消化。
关于python - 使用python从NLTK中提取名词短语,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/38194579/