我必须进行解析:目标是创建将应用于语料库的语法规则。我有一个问题:语法中是否可以有一个列表?
示例:
1) Open the text to be analyzed
2) Write the grammatical rules (just an example):
grammar("""
S -> NP VP
NP -> DET N
VP -> V N
DET -> list_det.txt
N -> list_n.txt
V -> list.txt""")
3) Print the result with the entries that obey this grammar
可能吗?
最佳答案
这是使用 pyparsing 的语法的快速概念原型(prototype)。我无法从你的问题中看出N
的内容是什么。 , V
,和DET
列表可能是,所以我只是任意选择由“n”和“v”以及字面意思“det”组成的单词。您可以替换 <<=
分配具有适合您的语法的正确表达式,但是此解析器和示例字符串应该表明您的语法至少是可行的。 (如果您编辑问题以显示 N
、 V
和 DET
是列表,我可以用更少的任意表达式和示例更新此答案。还包括一个示例字符串被解析会很有用。)
我还添加了一些分组,以便您可以看到语法结构如何反射(reflect)在结果结构中。您可以保留或删除它,解析器仍然可以工作。
import pyparsing as pp
v = pp.Forward()
n = pp.Forward()
det = pp.Forward()
V = pp.Group(pp.OneOrMore(v))
N = pp.Group(pp.OneOrMore(n))
DET = pp.Group(pp.OneOrMore(det))
VP = pp.Group(V + N)
NP = pp.Group(DET + N)
S = NP + VP
# replace these with something meaningful
v <<= pp.Word('v')
n <<= pp.Word('n')
det <<= pp.Literal('det')
sample = 'det det nn nn nn nn vv vv vv nn nn nn nn'
parsed = S.parseString(sample)
print(parsed.asList())
打印:
[[['det', 'det'], ['nn', 'nn', 'nn', 'nn']],
[['vv', 'vv', 'vv'], ['nn', 'nn', 'nn', 'nn']]]
编辑:
我猜“NP”和“VP”是“名词短语”和“动词短语”,但我不知道“DET”可能是什么。不过,我还是编了一个不太抽象的例子。我还扩展了列表以接受更多语法形式的名词和动词列表,并连接“and”和逗号。
import pyparsing as pp
v = pp.Forward()
n = pp.Forward()
det = pp.Forward()
def collectionOf(expr):
'''
Compose a collection expression for a base expression that matches
expr
expr and expr
expr, expr, expr, and expr
'''
AND = pp.Literal('and')
OR = pp.Literal('or')
COMMA = pp.Suppress(',')
return expr + pp.Optional(
pp.Optional(pp.OneOrMore(COMMA + expr) + COMMA) + (AND | OR) + expr)
V = pp.Group(collectionOf(v))('V')
N = pp.Group(collectionOf(n))('N')
DET = pp.Group(pp.OneOrMore(det))('DET')
VP = pp.Group(V + N)('VP')
NP = pp.Group(DET + N)('NP')
S = pp.Group(NP + VP)('S')
# replace these with something meaningful
v <<= pp.Combine(pp.oneOf('chase love hate like eat drink') + pp.Optional(pp.Literal('s')))
n <<= pp.Optional(pp.oneOf('the a my your our his her their')) + pp.oneOf("dog cat horse rabbit squirrel food water")
det <<= pp.Optional(pp.oneOf('why how when where')) +pp.oneOf( 'do does did')
samples = '''
when does the dog eat the food
does the dog like the cat
do the horse, cat, and dog like or hate their food
do the horse and dog love the cat
why did the dog chase the squirrel
'''
S.runTests(samples)
打印:
when does the dog eat the food
[[[['when', 'does'], ['the', 'dog']], [['eat'], ['the', 'food']]]]
- S: [[['when', 'does'], ['the', 'dog']], [['eat'], ['the', 'food']]]
- NP: [['when', 'does'], ['the', 'dog']]
- DET: ['when', 'does']
- N: ['the', 'dog']
- VP: [['eat'], ['the', 'food']]
- N: ['the', 'food']
- V: ['eat']
does the dog like the cat
[[[['does'], ['the', 'dog']], [['like'], ['the', 'cat']]]]
- S: [[['does'], ['the', 'dog']], [['like'], ['the', 'cat']]]
- NP: [['does'], ['the', 'dog']]
- DET: ['does']
- N: ['the', 'dog']
- VP: [['like'], ['the', 'cat']]
- N: ['the', 'cat']
- V: ['like']
do the horse, cat, and dog like or hate their food
[[[['do'], ['the', 'horse', 'cat', 'and', 'dog']], [['like', 'or', 'hate'], ['their', 'food']]]]
- S: [[['do'], ['the', 'horse', 'cat', 'and', 'dog']], [['like', 'or', 'hate'], ['their', 'food']]]
- NP: [['do'], ['the', 'horse', 'cat', 'and', 'dog']]
- DET: ['do']
- N: ['the', 'horse', 'cat', 'and', 'dog']
- VP: [['like', 'or', 'hate'], ['their', 'food']]
- N: ['their', 'food']
- V: ['like', 'or', 'hate']
do the horse and dog love the cat
[[[['do'], ['the', 'horse', 'and', 'dog']], [['love'], ['the', 'cat']]]]
- S: [[['do'], ['the', 'horse', 'and', 'dog']], [['love'], ['the', 'cat']]]
- NP: [['do'], ['the', 'horse', 'and', 'dog']]
- DET: ['do']
- N: ['the', 'horse', 'and', 'dog']
- VP: [['love'], ['the', 'cat']]
- N: ['the', 'cat']
- V: ['love']
why did the dog chase the squirrel
[[[['why', 'did'], ['the', 'dog']], [['chase'], ['the', 'squirrel']]]]
- S: [[['why', 'did'], ['the', 'dog']], [['chase'], ['the', 'squirrel']]]
- NP: [['why', 'did'], ['the', 'dog']]
- DET: ['why', 'did']
- N: ['the', 'dog']
- VP: [['chase'], ['the', 'squirrel']]
- N: ['the', 'squirrel']
- V: ['chase']
关于python - 使用 python 和语法列表解析文本文件,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/45981339/