python - pyparsing中如何解析节点和节点关系？

我已经构建了一个原始解析器，但我真的很想让它在 pyparsing 中工作。

我想解析两种类型的字符串。
仅解析节点和第二个节点关系的一种

verb node1, node2, ...

和

verb node1->node2->node3

可以指定一个或多个节点，可以引用
此外，您可以通过添加 ^ 来指示一个节点在另一个节点内。

verb node1, node2 ^ node3, node4

您可能还想使用 -> 来指示节点关系。 , <-或 <->指标。

verb node1->node2<->node3

同样，您可以使用 ^ 指示一个节点在另一个节点内。

verb node1->node2^node4<->node3

最佳答案

这种格式的概念 BNF 如下所示:

node :: word composed of alphas, digits, '_'
verb :: one of several defined keywords
binop :: '->' | '<-' | '<->'
nodeFactor :: node '^' node | node
nodeExpr :: nodeFactor op nodeFactor
nodeCommand :: verb nodeExpr [',' nodeExpr]...

这映射到 pyparsing 几乎一步一步:

from pyparsing import (Word,alphas,alphanums,Keyword,
    infixNotation,opAssoc,oneOf,delimitedList)

nodeRef = Word(alphas,alphanums+'_')
GO, TURN, FOLLOW = map(Keyword, "GO TURN FOLLOW".split())
verb = GO | TURN | FOLLOW
binop = oneOf('-> <- <->')

下一部分最容易使用 pyparsing 的 infixNotation 实现。方法(以前称为 operatorPrecedence)。 infixNotation允许我们定义操作的层次结构，并将根据层次结构定义的优先级对解析的输出进行分组。我假设您的 '^'应该在二进制 '->' 之前评估“在内部”运算符等运算符。 infixNotation还允许在括号内嵌套，但是您的示例都没有表明这是绝对需要的。您定义 infixNotation通过指定基本操作数类型，后跟三元组列表，每个元组显示运算符，一元、二元或三元运算符的值 1,2 或 3，以及常量 opAssoc.LEFT或 RIGHT对于运算符的左或右关联性:

nodeExpr = infixNotation(nodeRef,
    [
    ('^', 2, opAssoc.LEFT),
    (binop, 2, opAssoc.LEFT),
    ])

最后，我们定义了整体表达式，我将其解释为某种命令。节点表达式的逗号分隔列表可以直接实现为 nodeExpr + ZeroOrMore(Suppress(',') + nodeExpr) (我们从解析的输出中抑制逗号 - 它们在解析时很有用，但之后我们只需要跳过它们)。但这经常出现，pyparsing 提供了方法 delimitedList :

nodeCommand = verb('verb') + delimitedList(nodeExpr)('nodes')

名称“verb”和“nodes”导致在各自表达式中解析的结果与这些名称相关联，这将使解析完成后更容易处理解析的数据。

现在测试解析器:

tests = """\
    GO node1,node2
    TURN node1->node2->node3
    GO node1,node2^node3,node4
    FOLLOW node1->node2<->node3
    GO node5,node1->node2^node4<->node3,node6
    """.splitlines()
for test in tests:
    test = test.strip()
    if not test:
        continue
    print (test)
    try:
        result = nodeCommand.parseString(test, parseAll=True)
        print (result.dump())
    except ParseException as pe:
        print ("Failed:", test)
        print (pe)

dump()方法将解析的标记打印为嵌套列表，然后列出每个结果名称及其附加值:

GO node1,node2
['GO', 'node1', 'node2']
- nodes: ['node1', 'node2']
- verb: GO
TURN node1->node2->node3
['TURN', ['node1', '->', 'node2', '->', 'node3']]
- nodes: [['node1', '->', 'node2', '->', 'node3']]
- verb: TURN
GO node1,node2^node3,node4
['GO', 'node1', ['node2', '^', 'node3'], 'node4']
- nodes: ['node1', ['node2', '^', 'node3'], 'node4']
- verb: GO
FOLLOW node1->node2<->node3
['FOLLOW', ['node1', '->', 'node2', '<->', 'node3']]
- nodes: [['node1', '->', 'node2', '<->', 'node3']]
- verb: FOLLOW
GO node5,node1->node2^node4<->node3,node6
['GO', 'node5', ['node1', '->', ['node2', '^', 'node4'], '<->', 'node3'], 'node6']
- nodes: ['node5', ['node1', '->', ['node2', '^', 'node4'], '<->', 'node3'], 'node6']
- verb: GO

此时，您可以只解析您的命令，然后根据 verb ，分派(dispatch)给执行该动词的任何适当方法。

但是让我提出一个我发现有助于使用 Python 对象捕获此逻辑的结构。定义一个简单的命令类层次结构，在抽象方法中实现各种动词函数doCommand :

# base class
class Command(object):
    def __init__(self, tokens):
        self.cmd = tokens.verb
        self.nodeExprs = tokens.nodes

    def doCommand(self):
        """
        Execute command logic, using self.cmd and self.nodeExprs.
        To be overridden in sub classes.
        """
        print (self.cmd, '::', self.nodeExprs.asList())

# these should implement doCommand, but not needed for this example
class GoCommand(Command): pass
class TurnCommand(Command): pass
class FollowCommand(Command): pass

此方法会将您的解析结果转换为相应命令类的实例:

verbClassMap = {
    'GO' : GoCommand,
    'TURN' : TurnCommand,
    'FOLLOW' : FollowCommand,
    }
def tokensToCommand(tokens):
    cls = verbClassMap[tokens.verb]
    return cls(tokens)

但是您也可以将它作为解析时回调构建到解析器中，这样一旦解析完成，您不仅会获得字符串和子列表的列表，而且还可以通过调用 doCommand 来准备“执行”的对象。方法。为此，只需附上 tokensToCommand作为对整体 nodeCommand 的解析操作表达:

nodeCommand.setParseAction(tokensToCommand)

现在我们稍微修改一下我们的测试代码:

for test in tests:
    test = test.strip()
    if not test:
        continue
    try:
        result = nodeCommand.parseString(test, parseAll=True)
        result[0].doCommand()
    except ParseException as pe:
        print ("Failed:", test)
        print (pe)

因为我们实际上并没有实现 doCommand在子类上，我们得到的只是默认的基类行为，它只是回显解析的动词和节点列表:

GO :: ['node1', 'node2']
TURN :: [['node1', '->', 'node2', '->', 'node3']]
GO :: ['node1', ['node2', '^', 'node3'], 'node4']
FOLLOW :: [['node1', '->', 'node2', '<->', 'node3']]
GO :: ['node5', ['node1', '->', ['node2', '^', 'node4'], '<->', 'node3'], 'node6']

(此代码使用 Python 3，pyparsing 2.0.0 运行。它也将使用 Python 2，pyparsing 1.5.7 运行。)

编辑

获取链式表达式 a op b op c进入 [a,op,b], [b, op, c] ，使用解析操作将 [a,op,b,op,c] 结果重构为成对表达式。 infixNotation方法允许您定义解析操作以附加到运算符层次结构中的级别。

重构链式表达式结果的方法如下所示:

def expandChainedExpr(tokens):
    ret = ParseResults([])
    tokeniter = iter(tokens[0])
    lastexpr = next(tokeniter)
    for op,nextexpr in zip(tokeniter,tokeniter):
        ret += ParseResults([[lastexpr, op, nextexpr]])
        lastexpr = nextexpr
    return ret

这将构建一个全新的 ParseResults 来替换原始的链接结果。注意每个 lastexpr op nextexpr保存为自己的子组，然后 nextexpr被复制到 lastexpr , 然后循环获取下一个 op-nextexpr 对。

要将此重新格式化程序附加到解析器中，请将其作为该层次结构级别的第四个元素添加到 infixNotation 中。 :

nodeExpr = infixNotation(nodeRef,
    [
    ('^', 2, opAssoc.LEFT),
    (binop, 2, opAssoc.LEFT, expandChainedExpr),
    ])

现在的输出:

FOLLOW node1->node2<->node3

扩展为:

('FOLLOW', '::', [['node1', '->', 'node2'], ['node2', '<->', 'node3']])

关于python - pyparsing中如何解析节点和节点关系？，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/15154375/

python - pyparsing中如何解析节点和节点关系？

上一篇：python - Django - AuthenticationMiddleware 设置 request.user

下一篇：python - Python 中的 C 扩展 : Conver wchar string to Python Value