python - 使用 python 和语法列表解析文本文件

标签 python python-3.x parsing nlp text-parsing

我必须进行解析:目标是创建将应用于语料库的语法规则。我有一个问题:语法中是否可以有一个列表?

示例:

1) Open the text to be analyzed
2) Write the grammatical rules (just an example):
   grammar("""
   S -> NP VP
   NP -> DET N
   VP -> V N
   DET -> list_det.txt
   N -> list_n.txt
   V -> list.txt""")
3) Print the result with the entries that obey this grammar

可能吗?

最佳答案

这是使用 pyparsing 的语法的快速概念原型(prototype)。我无法从你的问题中看出N的内容是什么。 , V ,和DET列表可能是,所以我只是任意选择由“n”和“v”以及字面意思“det”组成的单词。您可以替换 <<=分配具有适合您的语法的正确表达式,但是此解析器和示例字符串应该表明您的语法至少是可行的。 (如果您编辑问题以显示 NVDET 是列表,我可以用更少的任意表达式和示例更新此答案。还包括一个示例字符串被解析会很有用。)

我还添加了一些分组,以便您可以看到语法结构如何反射(reflect)在结果结构中。您可以保留或删除它,解析器仍然可以工作。

import pyparsing as pp

v = pp.Forward()
n = pp.Forward()
det = pp.Forward()

V = pp.Group(pp.OneOrMore(v))
N = pp.Group(pp.OneOrMore(n))
DET = pp.Group(pp.OneOrMore(det))

VP = pp.Group(V + N)
NP = pp.Group(DET + N)
S = NP + VP

# replace these with something meaningful
v <<= pp.Word('v')
n <<= pp.Word('n')
det <<= pp.Literal('det')

sample = 'det det nn nn nn nn vv vv vv nn nn nn nn'

parsed = S.parseString(sample)
print(parsed.asList())

打印:

[[['det', 'det'], ['nn', 'nn', 'nn', 'nn']], 
 [['vv', 'vv', 'vv'], ['nn', 'nn', 'nn', 'nn']]]

编辑:

我猜“NP”和“VP”是“名词短语”和“动词短语”,但我不知道“DET”可能是什么。不过,我还是编了一个不太抽象的例子。我还扩展了列表以接受更多语法形式的名词和动词列表,并连接“and”和逗号。

import pyparsing as pp

v = pp.Forward()
n = pp.Forward()
det = pp.Forward()

def collectionOf(expr):
    '''
    Compose a collection expression for a base expression that matches
        expr
        expr and expr
        expr, expr, expr, and expr
    '''
    AND = pp.Literal('and')
    OR = pp.Literal('or')
    COMMA = pp.Suppress(',')
    return expr + pp.Optional(
            pp.Optional(pp.OneOrMore(COMMA + expr) + COMMA) + (AND | OR) + expr)

V = pp.Group(collectionOf(v))('V')
N = pp.Group(collectionOf(n))('N')
DET = pp.Group(pp.OneOrMore(det))('DET')

VP = pp.Group(V + N)('VP')
NP = pp.Group(DET + N)('NP')
S = pp.Group(NP + VP)('S')

# replace these with something meaningful
v <<= pp.Combine(pp.oneOf('chase love hate like eat drink') + pp.Optional(pp.Literal('s')))
n <<= pp.Optional(pp.oneOf('the a my your our his her their')) + pp.oneOf("dog cat horse rabbit squirrel food water")
det <<= pp.Optional(pp.oneOf('why how when where')) +pp.oneOf( 'do does did')

samples = '''
    when does the dog eat the food
    does the dog like the cat
    do the horse, cat, and dog like or hate their food
    do the horse and dog love the cat
    why did the dog chase the squirrel
'''
S.runTests(samples)

打印:

when does the dog eat the food
[[[['when', 'does'], ['the', 'dog']], [['eat'], ['the', 'food']]]]
- S: [[['when', 'does'], ['the', 'dog']], [['eat'], ['the', 'food']]]
  - NP: [['when', 'does'], ['the', 'dog']]
    - DET: ['when', 'does']
    - N: ['the', 'dog']
  - VP: [['eat'], ['the', 'food']]
    - N: ['the', 'food']
    - V: ['eat']


does the dog like the cat
[[[['does'], ['the', 'dog']], [['like'], ['the', 'cat']]]]
- S: [[['does'], ['the', 'dog']], [['like'], ['the', 'cat']]]
  - NP: [['does'], ['the', 'dog']]
    - DET: ['does']
    - N: ['the', 'dog']
  - VP: [['like'], ['the', 'cat']]
    - N: ['the', 'cat']
    - V: ['like']


do the horse, cat, and dog like or hate their food
[[[['do'], ['the', 'horse', 'cat', 'and', 'dog']], [['like', 'or', 'hate'], ['their', 'food']]]]
- S: [[['do'], ['the', 'horse', 'cat', 'and', 'dog']], [['like', 'or', 'hate'], ['their', 'food']]]
  - NP: [['do'], ['the', 'horse', 'cat', 'and', 'dog']]
    - DET: ['do']
    - N: ['the', 'horse', 'cat', 'and', 'dog']
  - VP: [['like', 'or', 'hate'], ['their', 'food']]
    - N: ['their', 'food']
    - V: ['like', 'or', 'hate']


do the horse and dog love the cat
[[[['do'], ['the', 'horse', 'and', 'dog']], [['love'], ['the', 'cat']]]]
- S: [[['do'], ['the', 'horse', 'and', 'dog']], [['love'], ['the', 'cat']]]
  - NP: [['do'], ['the', 'horse', 'and', 'dog']]
    - DET: ['do']
    - N: ['the', 'horse', 'and', 'dog']
  - VP: [['love'], ['the', 'cat']]
    - N: ['the', 'cat']
    - V: ['love']


why did the dog chase the squirrel
[[[['why', 'did'], ['the', 'dog']], [['chase'], ['the', 'squirrel']]]]
- S: [[['why', 'did'], ['the', 'dog']], [['chase'], ['the', 'squirrel']]]
  - NP: [['why', 'did'], ['the', 'dog']]
    - DET: ['why', 'did']
    - N: ['the', 'dog']
  - VP: [['chase'], ['the', 'squirrel']]
    - N: ['the', 'squirrel']
    - V: ['chase']

关于python - 使用 python 和语法列表解析文本文件,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/45981339/

相关文章:

Android:无法解析 SOAP 响应中的空值 (kSoap2)

javascript - 为什么这个字符串不能解析为 JSON?

python - 线程不允许在 python Flask 中保存文件

python - Jupyter Notebook 中没有名为 'graphviz' 的模块

python 3 : Join lines of file if its not endswith a special character

Python - 创建集合列表或集合列表?

python - 访问BeautifulSoup4中的值

python - 检测一个字符串中的多个模式 - python-regex

python - 来自jython中已解析的电子邮件对象的电子邮件正文

python - 使用唯一值数组的索引数组