parsing - Elm 解析器循环不会终止

标签 parsing elm

我遇到了一个我无法解决的解析器递归问题。任何有关导致该问题的原因的建议将不胜感激。

当函数 rawData 时,以下代码可以正常工作用有限数量的元素定义(如下面的注释代码所示)。但当使用 Parser.loop 定义时,不会停止(直到堆栈溢出)如代码所示。相同的循环结构可以与所有其他函数一起正常工作(例如 filesdirectories )

module Reader exposing (..)

import Parser exposing (..)

type TermCmd
    = CD Argument
    | LS


type Argument
    = Home
    | UpOne
    | DownOne String


type Content
    = Dir String (List Content)
    | File Int String String


type alias RawData =
    List ( List TermCmd, List Content )


rawData : Parser RawData
rawData =
    loop [] <| loopHelper dataChunk        -- This never ends...

-- succeed (\a b c d -> [ a, b, c, d ])    -- but this works
--     |= dataChunk
--     |= dataChunk
--     |= dataChunk
--     |= dataChunk


dataChunk : Parser ( List TermCmd, List Content )
dataChunk =
    succeed (\cmds ctnt -> ( cmds, ctnt ))
        |= commands
        |= contents


directory : Parser Content
directory =
    succeed Dir
        |. symbol "dir"
        |. spaces
        |= (chompUntilEndOr "\n"
                |> getChompedString
           )
        |= succeed []
        |. spaces


file : Parser Content
file =
    succeed File
        |= int
        |. spaces
        |= (chompWhile (\c -> c /= '.' && c /= '\n')
                |> getChompedString
           )
        |= (chompUntilEndOr "\n"
                |> getChompedString
                |> Parser.map (String.dropLeft 1)
           )
        |. spaces


command : Parser TermCmd
command =
    succeed identity
        |. symbol "$"
        |. spaces
        |= oneOf
            [ succeed CD
                |. symbol "cd"
                |. spaces
                |= argument
            , succeed LS
                |. symbol "ls"
            ]
        |. spaces


argument : Parser Argument
argument =
    oneOf
        [ succeed UpOne |. symbol ".."
        , succeed Home |. symbol "/"
        , succeed DownOne |= (chompUntilEndOr "\n" |> getChompedString)
        , problem "Bad argument"
        ]
        |. spaces

contents : Parser (List Content)
contents =
    let
        contentHelper revContent =
            oneOf
                [ succeed (\ctnt -> Loop (ctnt :: revContent))
                    |= file
                , succeed (\ctnt -> Loop (ctnt :: revContent))
                    |= directory
                , succeed ()
                    |> map (\_ -> Done (List.reverse revContent))
                ]
    in
    loop [] contentHelper


commands : Parser (List TermCmd)
commands =
    loop [] <| loopHelper command


directories : Parser (List Content)
directories =
    loop [] <| loopHelper directory


files : Parser (List Content)
files =
    loop [] <| loopHelper file


loopHelper : Parser a -> List a -> Parser (Step (List a) (List a))
loopHelper parser revContent =
    oneOf
        [ succeed (\ctnt -> Loop (ctnt :: revContent))
            |= parser
        , succeed ()
            |> map (\_ -> Done (List.reverse revContent))
        ]
sampleInput =
    "$ cd /\n$ ls\ndir a\n14848514 b.txt\n8504156 c.dat\ndir d\n$ cd a\n$ ls\ndir e\n29116 f\n2557 g\n62596 h.lst\n$ cd e\n$ ls\n584 i\n$ cd ..\n$ cd ..\n$ cd d\n$ ls\n4060174 j\n8033020 d.log\n5626152 d.ext\n7214296 k"

rawData函数进入无限循环,但相同的构造( loop [] <| loopHelper parser )在其他地方都可以正常工作。

最佳答案

您可能可以通过运行四步解析器(即开始 succeed (\a b c d -> [ a, b, c, d ]) 的解析器来了解问题所在)在空字符串上。如果这样做,您将得到以下结果:

Ok [([],[]),([],[]),([],[]),([],[])]

花点时间思考一下五步解析器、十步解析器、甚至 100 步解析器会得到什么。 loop 提供了一个可以运行任意数量步骤的解析器。

Elm documentation for the loop function提示您的问题:

Parsers like succeed () and chompWhile Char.isAlpha can succeed without consuming any characters. So in some cases you may want to use getOffset to ensure that each step actually consumed characters. Otherwise you could end up in an infinite loop!

您的解析器遇到无限循环,因为它输出无限长的元组列表,每个元组都有一个空命令列表。您的解析器在生成每个这样的元组时不消耗任何字符,因此它将永远循环。

在您的情况下,空命令列表似乎没有意义。因此我们必须确保空的命令列表会导致解析失败。

实现此目的的一种方法是编写 loopHelper 的变体,如果列表为空,该变体将失败:

checkNonEmpty : List a -> Parser ()
checkNonEmpty list =
    if List.isEmpty list then
        problem "List is empty"

    else
        succeed ()


loopHelperNonEmpty : Parser a -> List a -> Parser (Step (List a) (List a))
loopHelperNonEmpty parser revContent =
    oneOf
        [ succeed (\ctnt -> Loop (ctnt :: revContent))
            |= parser
        , checkNonEmpty revContent
            |> map (\_ -> Done (List.reverse revContent))
        ]

(我在这里找不到引入 getOffset 的简单方法,所以我做了一些不同的事情。)

然后,您可以更改 commands 的定义以使用此函数而不是 loopHelper:

commands : Parser (List TermCmd)
commands =
    loop [] <| loopHelperNonEmpty command

我对您的代码进行了此更改,它生成了以下输出:

Ok
    [ ( [ CD Home, LS ]
        , [ Dir "a" [], File 14848514 "b" "txt", File 8504156 "c" "dat", Dir "d" [] ]
        )
    , ( [ CD (DownOne "a"), LS ]
        , [ Dir "e" [], File 29116 "f" "", File 2557 "g" "", File 62596 "h" "lst" ]
        )
    , ( [ CD (DownOne "e"), LS ]
        , [ File 584 "i" "" ]
        )
    , ( [ CD UpOne, CD UpOne, CD (DownOne "d"), LS ]
        , [ File 4060174 "j" "", File 8033020 "d" "log", File 5626152 "d" "ext", File 7214296 "k" "" ]
        )
    ]

(为了清楚起见,我已经对其进行了格式化。在研究您的代码时,我只是使用 Debug.toString() 将解析器的结果输出到浏览器窗口中,但这会显示为一长串行。我将其粘贴到 VS Code 中,添加了一些换行符并使用 elm-format 将其格式化为更好的格式。)

关于parsing - Elm 解析器循环不会终止,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/75202570/

相关文章:

c# - DateTime.ParseExact 字符串格式异常

c++ - 在 C++ 编译过程中,上下文敏感性在哪里得到解决?

elm - 遍历列表以创建元素

package - Elm安装包位置

Elm:如何将 Html FooMsg 转换为 Html Msg

elm - 如何在榆树中查看自定义类型?

python - 为什么 Parsimonious 以 IncompleteParseError 拒绝我的输入?

java - 如何在 Java 中将文本解析为列表?

java - 如何使用分隔符排除括号?

elm - 如何在 Elm 中自动滚动到 div 的底部