Data.Attoparsec.Text
导出 takeWhile
和 takeWhile1
:
takeWhile :: (Char -> Bool) -> Parser Text
Consume input as long as the predicate returns
True
, and return the consumed input.This parser does not fail. It will return an empty string if the predicate returns
False
on the first character of input.[...]
takeWhile1 :: (Char -> Bool) -> Parser Text
Consume input as long as the predicate returns
True
, and return the consumed input.This parser requires the predicate to succeed on at least one character of input: it will fail if the predicate never returns
True
or if there is no input left.
attoparsec
的文档鼓励用户
Use the
Text
-oriented parsers whenever possible, e.g.takeWhile1
instead ofmany1 anyChar
. There is about a factor of 100 difference in performance between the two kinds of parser.
这两个解析器非常有用,但我一直觉得需要一个更通用的 takeWhile1
版本,更具体地说,需要一些假设的解析器
takeWhileLo :: (Char -> Bool) -> Int -> Parser Text
takeWhileLo f lo = undefined
将解析至少 lo
个满足谓词f
的字符,其中lo
是任意非负整数。
我查看了takeWhile1
's implementation ,但它使用了一堆 Data.Attoparsec.Text.Internal
私有(private)的函数,并且似乎不容易通用。
我想出了以下应用实现:
{-# LANGUAGE OverloadedStrings #-}
import Prelude hiding ( takeWhile )
import Control.Applicative ( (<*>) )
import Data.Text ( Text )
import qualified Data.Text as T
import Data.Attoparsec.Text
takeWhileLo :: (Char -> Bool) -> Int -> Parser Text
takeWhileLo f lo =
T.append . T.pack <$> count lo (satisfy f) <*> takeWhile f
它的工作原理如广告所示,
λ> parseOnly (takeWhileLo (== 'a') 4) "aaa"
Left "not enough input"
λ> parseOnly (takeWhileLo (== 'a') 4) "aaaa"
Right "aaaa"
λ> parseOnly (takeWhileLo (== 'a') 4) "aaaaaaaaaaaaa"
Right "aaaaaaaaaaaaa"
但是打包 count
返回的中间结果列表的需要让我担心,特别是对于 lo
很大的情况......这似乎违背了建议至
use the
Text
-oriented parsers whenever possible [...]
我错过了什么吗?是否有更有效/惯用的方法来实现这样的 takeWhileLo
组合器?
最佳答案
Parser
是一个 monad,因此您可以只检查返回值,如果长度不正确则失败:
takeWhileLo :: (Char -> Bool) -> Int -> Parser Text
takeWhileLo f lo = do
text <- takeWhile f
case T.compareLength text lo of
LT -> empty
_ -> return text
compareLength
来自 text
包。它比比较text
的长度更有效,因为compareLength
可能会短路。
关于haskell - 如何编写 attoparsec 的 takeWhile1 的更通用(但高效)版本?,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/31146547/