当我从 nltk 执行斯坦福解析器时,我得到以下结果。
(S (VP (VB get) (NP (PRP me)) (ADVP (RB now))))
但我需要它的形式
S -> VP
VP -> VB NP ADVP
VB -> get
PRP -> me
RB -> now
我怎样才能得到这个结果,也许使用递归函数。 是否已经有内置功能?
最佳答案
首先导航树,请参阅 How to iterate through all nodes of a tree?和 How to navigate a nltk.tree.Tree? :
>>> from nltk.tree import Tree
>>> bracket_parse = "(S (VP (VB get) (NP (PRP me)) (ADVP (RB now))))"
>>> ptree = Tree.fromstring(bracket_parse)
>>> ptree
Tree('S', [Tree('VP', [Tree('VB', ['get']), Tree('NP', [Tree('PRP', ['me'])]), Tree('ADVP', [Tree('RB', ['now'])])])])
>>> for subtree in ptree.subtrees():
... print subtree
...
(S (VP (VB get) (NP (PRP me)) (ADVP (RB now))))
(VP (VB get) (NP (PRP me)) (ADVP (RB now)))
(VB get)
(NP (PRP me))
(PRP me)
(ADVP (RB now))
(RB now)
而您正在寻找的是 https://github.com/nltk/nltk/blob/develop/nltk/tree.py#L341 :
>>> ptree.productions()
[S -> VP, VP -> VB NP ADVP, VB -> 'get', NP -> PRP, PRP -> 'me', ADVP -> RB, RB -> 'now']
注意 Tree.productions()
返回一个 Production
对象,参见 https://github.com/nltk/nltk/blob/develop/nltk/tree.py#L22和 https://github.com/nltk/nltk/blob/develop/nltk/grammar.py#L236 .
如果你想要语法规则的字符串形式,你可以这样做:
>>> for rule in ptree.productions():
... print rule
...
S -> VP
VP -> VB NP ADVP
VB -> 'get'
NP -> PRP
PRP -> 'me'
ADVP -> RB
RB -> 'now'
或者
>>> rules = [str(p) for p in ptree.productions()]
>>> rules
['S -> VP', 'VP -> VB NP ADVP', "VB -> 'get'", 'NP -> PRP', "PRP -> 'me'", 'ADVP -> RB', "RB -> 'now'"]
关于python - 从解析结果中提取语法规则,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/33140945/