parsing - 如何使用 Haskell 解析中缀而不是前缀?

标签 parsing haskell tokenize infix-notation infix-operator

我需要帮助我尝试用 Haskell 编写的这个程序。我已经写了大部分,这里是我基本上想要做的事情:

  • 当我写

  • 解析“a + b”
    在终端中,我希望将其作为输出:
    加(字“a”)(字“b”)
  • 当我写

  • 解析“a - 2 * b + c”
    在终端中,我希望将其作为输出:
    减号(单词“a”)(加号(Mult(Num 2)(单词“b”))(单词“c”))
    到目前为止我的代码:
    data Ast
        = Word String
        | Num Int
        | Mult Ast Ast
        | Plus Ast Ast
        | Minus Ast Ast
        deriving (Eq, Show)
    
    tokenize :: [Char] -> [String]
    tokenize [] = []
    tokenize (' ' : s) = tokenize s
    tokenize ('+' : s) = "+" : tokenize s
    tokenize ('*' : s) = "*" : tokenize s
    tokenize (c : s)
      | isDigit c =
        let (cs, s') = collectWhile isDigit s
         in (c : cs) : tokenize s'
      | isAlpha c =
        let (cs, s') = collectWhile isAlpha s
         in (c : cs) : tokenize s'
      | otherwise = error ("unexpected character " ++ show c)
    
    collectWhile :: (Char -> Bool) -> String -> (String, String)
    collectWhile p s = (takeWhile p s, dropWhile p s)
    
    isDigit, isAlpha :: Char -> Bool
    isDigit c = c `elem` ['0' .. '9']
    isAlpha c = c `elem` ['a' .. 'z'] ++ ['A' .. 'Z']
    
    parseU :: [String] -> (Ast, [String])
    parseU ("+" : s0) =
      let (e1, s1) = parseU s0
          (e2, s2) = parseU s1
       in (Plus e1 e2, s2)
    parseU ("*" : s0) =
      let (e1, s1) = parseU s0
          (e2, s2) = parseU s1
       in (Mult e1 e2, s2)
    parseU (t : ts)
      | isNumToken t = (Num (read t), ts)
      | isWordToken t = (Word t, ts)
      | otherwise = error ("unrecognized token " ++ show t)
    parseU [] = error "unexpected end of input"
    
    isNumToken, isWordToken :: String -> Bool
    isNumToken xs = takeWhile isDigit xs == xs
    isWordToken xs = takeWhile isAlpha xs == xs
    
    parse :: String -> Ast
    parse s =
      case parseU (tokenize s) of
        (e, []) -> e
        (_, t : _) -> error ("unexpected token " ++ show t)
    
    inn :: Ast -> String
    inn (Plus x y) = innP x ++ " + " ++ innP y
    inn (Mult x y) = innP x ++ " * " ++ innP y
    inn ast = innP ast
    
    innP :: Ast -> String
    innP (Num n) = show n
    innP (Plus x y) = "(" ++ innP x ++ " + " ++ innP y ++ ")"
    innP (Mult x y) = "(" ++ innP x ++ " * " ++ innP y ++ ")"
    innP (Word w) = w -- 
    
    innfiks :: String -> String
    innfiks s = inn (parse s)
    
    现在我在终端中发布我写的文本时出错,但是当我这样写时:
    解析“+ a b”
    我得到正确的输出:
    加(字“a”)(字“b”)
    我知道我必须更改代码,以便它接受我在此表单上发送给解析函数的内容:
    值运算符值,
    而不是在这个表格上:
    运算符值
    但我正在努力找出我必须在什么地方做这个改变。

    最佳答案

    为了优先处理中缀运算符,一种方法是引入一系列与优先级对应的解析函数。因此,如果您有可以乘以创建“项”的“因子”,可以添加或减去这些“项”以创建“表达式”,那么您将需要为这些级别中的每一个创建解析器函数。解析“因子”(即单词或数字)很容易,因为您已经编写了代码:

    parseFactor :: [String] -> (Ast, [String])
    parseFactor (t : ts)
      | isNumToken t = (Num (read t), ts)
      | isWordToken t = (Word t, ts)
      | otherwise = error ("unrecognized token " ++ show t)
    parseFactor [] = error "unexpected end of input"
    
    解析一个术语比较棘手。你想先解析一个因子,然后,可选地,一个 *后跟另一个因子,然后将其视为一个项,以进一步可选地乘以另一个因子,依此类推。以下是一种方法:
    parseTerm :: [String] -> (Ast, [String])
    parseTerm ts
      =  let (f1, ts1) = parseFactor ts     -- parse first factor
         in  go f1 ts1
      where go acc ("*":ts2)                -- add a factor to an accumulating term
              = let (f2, ts3) = parseFactor ts2
                in go (Mult acc f2) ts3
            go acc rest = (acc, rest)       -- no more factors: return the term
    
    如果你愿意,试着写一个类似的 parseExpr解析由 + 分隔的术语字符(暂时跳过减法),并在以下内容上进行测试:
    parseExpr (tokenize "2 + 3 * 6 + 4 * 8 * 12 + 1")
    
    对于剧透,这里有一个版本可以同时处理 +- ,但请注意,您的分词器尚未正确处理减法,因此您必须先解决该问题。
    parseExpr :: [String] -> (Ast, [String])
    parseExpr ts
      =  let (f1, ts1) = parseTerm ts
         in  go f1 ts1
      where go acc (op:ts2)
              | op == "+" || op == "-"
              = let (f2, ts3) = parseTerm ts2
                in go ((astOp op) acc f2) ts3
            go acc rest = (acc, rest)
            astOp "+" = Plus
            astOp "-" = Minus
    
    有了这个,你可以指向 parse到正确的解析器:
    parse :: String -> Ast
    parse s =
      case parseExpr (tokenize s) of
        (e, []) -> e
        (_, t : _) -> error ("unexpected token " ++ show t)
    
    你的例子应该有效:
    λ> parse "a - 2 * b + c"
    Plus (Minus (Word "a") (Mult (Num 2) (Word "b"))) (Word "c")
    
    请注意,这与您所说的您想要的输出略有不同,但此顺序对于左关联运算符是正确的(这对于正确处理 - 很重要)。也就是说,你想要:
    5 - 4 + 1
    
    解析为:
    (5 - 4) + 1  -- i.e., (Plus (Minus (Num 5) (Num 4)) (Num 1))
    
    以便评估者计算 2 的正确答案。如果您将其解析为:
    5 - (4 + 1)  -- i.e., (Minus (Num 5) (Plus (Num 4) (Num 1)))
    
    你的评估者会计算出错误的答案 0。
    但是,如果您真的想使用右关联运算符进行解析,请参见下文。
    左关联运算符的完整修改代码:
    data Ast
        = Word String
        | Num Int
        | Mult Ast Ast
        | Plus Ast Ast
        | Minus Ast Ast
        deriving (Eq, Show)
    
    tokenize :: [Char] -> [String]
    tokenize [] = []
    tokenize (' ' : s) = tokenize s
    tokenize ('-' : s) = "-" : tokenize s
    tokenize ('+' : s) = "+" : tokenize s
    tokenize ('*' : s) = "*" : tokenize s
    tokenize (c : s)
      | isDigit c =
        let (cs, s') = collectWhile isDigit s
         in (c : cs) : tokenize s'
      | isAlpha c =
        let (cs, s') = collectWhile isAlpha s
         in (c : cs) : tokenize s'
      | otherwise = error ("unexpected character " ++ show c)
    
    collectWhile :: (Char -> Bool) -> String -> (String, String)
    collectWhile p s = (takeWhile p s, dropWhile p s)
    
    isDigit, isAlpha :: Char -> Bool
    isDigit c = c `elem` ['0' .. '9']
    isAlpha c = c `elem` ['a' .. 'z'] ++ ['A' .. 'Z']
    
    parseFactor :: [String] -> (Ast, [String])
    parseFactor (t : ts)
      | isNumToken t = (Num (read t), ts)
      | isWordToken t = (Word t, ts)
      | otherwise = error ("unrecognized token " ++ show t)
    parseFactor [] = error "unexpected end of input"
    
    parseTerm :: [String] -> (Ast, [String])
    parseTerm ts
      =  let (f1, ts1) = parseFactor ts
         in  go f1 ts1
      where go acc ("*":ts2)
              = let (f2, ts3) = parseFactor ts2
                in go (Mult acc f2) ts3
            go acc rest = (acc, rest)
    
    parseExpr :: [String] -> (Ast, [String])
    parseExpr ts
      =  let (f1, ts1) = parseTerm ts
         in  go f1 ts1
      where go acc (op:ts2)
              | op == "+" || op == "-"
              = let (f2, ts3) = parseTerm ts2
                in go ((astOp op) acc f2) ts3
            go acc rest = (acc, rest)
            astOp "+" = Plus
            astOp "-" = Minus
    
    isNumToken, isWordToken :: String -> Bool
    isNumToken xs = takeWhile isDigit xs == xs
    isWordToken xs = takeWhile isAlpha xs == xs
    
    parse :: String -> Ast
    parse s =
      case parseExpr (tokenize s) of
        (e, []) -> e
        (_, t : _) -> error ("unexpected token " ++ show t)
    
    对于右结合运算符,修改这些定义:
    parseTerm :: [String] -> (Ast, [String])
    parseTerm ts
      =  let (fct, ts1) = parseFactor ts
         in  case ts1 of
               "*":ts2 -> let (trm, rest) = parseTerm ts2
                          in  (Mult fct trm, rest)
               _       -> (fct, ts1)
    
    parseExpr :: [String] -> (Ast, [String])
    parseExpr ts
      =  let (trm, ts1) = parseTerm ts
         in  case ts1 of
               op:ts2 | op == "+" || op == "-"
                       -> let (expr, rest) = parseExpr ts2
                          in  (astOp op trm expr, rest)
               _       -> (trm, ts1)
      where astOp "+" = Plus
            astOp "-" = Minus*
    

    关于parsing - 如何使用 Haskell 解析中缀而不是前缀?,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/64049049/

    相关文章:

    ruby - Ruby解析器的原理是什么?

    python - 使用 python 从获取的 HTML 代码中提取特定信息

    haskell - 如何检查 Haskell 中缀运算符优先级

    deep-learning - BERT 分词器如何生成 (b, 24, 768) 的输入张量形状?

    python - 我怎样才能让 Spacy 停止将带连字符的数字和单词拆分成单独的标记?

    objective-c - Objective-C 中的 NSString 标记化

    javascript - Dojo 如何解析整个 JsonRestStore(JSON 到字符串)

    c# - 解析 FtpWebRequest ListDirectoryDe​​tails 行

    list - 从列表中选择递增序列

    list - 通过函数组合有效的列表追加/前置