parsing - John Hughes 的确定性 LL(1) 使用 Arrow 和错误进行解析

我想根据 John Hughes 的论文写一个解析器 Generalizing Monads to Arrows.在通读并尝试重新实现他的代码时，我意识到有些事情不太合理。在一节中，他展示了基于 Swierstra 和 Duponchel 的论文 Deterministic, error-correcting combinator parsers 的解析器实现。使用箭头。他描述的解析器类型是这样的:

data StaticParser ch = SP Bool [ch]
data DynamicParser ch a b = DP (a, [ch]) -> (b, [ch])
data Parser ch a b = P (StaticParser ch) (DynamicParser ch a b)

组合运算符看起来像这样:

(.) :: Parser ch b c -> Parser ch a b -> Parser ch a c
  P (SP e2 st2) (DP f2) . P (SP e1 st1) (DP f1) =
  P (SP (e1 && e2) (st1 `union` if e1 then st2 else []))
    (DP $ f2 . f1)

问题在于解析器的组成q . p “忘记”q的起始符号。我想到的一种可能的解释是休斯期望我们所有的DynamicParser s 为总数，使得符号解析器的类型签名为 symbol :: ch -> Parser ch a (Maybe ch)而不是 symbol :: ch -> Parser ch a ch .这仍然看起来很尴尬，因为我们必须复制信息，将起始符号信息放在 StaticParser 中。和 DynamicParser .另一个问题是几乎所有解析器都有可能抛出异常，这意味着我们将不得不在 Maybe 中花费大量时间。或 Either创建本质上是“单子(monad)不构成问题”。这可以通过重写 DynamicParser 来补救本身来处理失败或作为 Arrow transformer,但这与论文相差甚远。这些问题都没有在论文中得到解决，而且解析器被呈现为好像它显然有效，所以我觉得我必须遗漏一些基本的东西。如果有人能捕获我错过的东西，那将是非常有帮助的。

最佳答案

我认为 Swierstra 和 Duponcheel 描述的确定性解析器与传统解析器有点不同:它们根本不处理失败，只有选择。
另见 invokeDet S&D 论文中的函数:

invokeDet :: Symbol s => DetPar s a -> Input s -> a
invokeDet (_, p) inp = case p inp [] of (a, _) -> a

这个函数显然假设它总是能够找到一个有效的解析。
使用 Hughes 描述的解析器的箭头版本，您可以编写如下示例:

main = do
  let p = symbol 'a' >>> (symbol 'b' <+> symbol 'c')
  print $ invokeDet p "ab"
  print $ invokeDet p "ac"

这将打印预期:

'b'
'c'

但是，如果您编写“失败”解析:

main = do
  let p = symbol 'a' >>> (symbol 'b' <+> symbol 'c')
  print $ invokeDet p "ad"

它仍然会打印:

'c'

为了让这种行为更合理一些，Swierstra 和 Duponcheel 还引入了纠错功能。输出 'c'如果我们假设错误的字符 d已更正为 c在输入中。这需要一个额外的机制，而这个机制可能太复杂而无法包含在 Hughes 的论文中。
我在此处上传了用于获得这些结果的实现:https://gist.github.com/noughtmare/eced4441332784cc8212e9c0adb68b35
有关相同风格的更实用的解析器的更多信息(但不再是确定性的并且不再局限于 LL(1))，我真的很喜欢 "Combinator Parsing: A Short Tutorial"通过 Swierstra。来自第 9.3 节的有趣摘录:

A subtle point here is the question how to deal with monadic parsers. As we described in [13] the static analysis does not go well with monadic computations, since in that case we dynamically build new parses based on the input produced thus far: the whole idea of a static analysis is that it is static. This observation has lead John Hughes to propose arrows for dealing with such situations [7]. It is only recently that we realised that, although our arguments still hold in general, they do not apply to the case of the LL(1) analysis. If we want to compute the symbols which can be recognised as the first symbol by a parser of the form p >>= q then we are only interested in the starting symbols of the right hand side if the left hand side can recognise the empty string; the good news is that in that case we statically know what value will be returned as a witness, and can pass this value on to q, and analyse the result of this call statically too. Unfortunately we will have to take special precautions in case the left hand side operator contains a call to pErrors in one of the empty derivations, since then it is no longer true that the witness of this alternative can be determined statically.

Swierstra 的完整解析器实现可以在 uu-parsinglib 中找到。包，虽然我不知道那里实现了多少扩展。

关于parsing - John Hughes 的确定性 LL(1) 使用 Arrow 和错误进行解析，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/68951481/

parsing - John Hughes 的确定性 LL(1) 使用 Arrow 和错误进行解析

上一篇：c - 为什么这段代码的结果有和没有 "-fsanitize=undefined,address"不同？

下一篇：javascript - 使用 useEffect 操作 DOM 事件在不同屏幕上不响应