haskell - Conduit - 将 ByteString 源拆分为字节 block

标签 haskell conduit

sourceFile我们得到一个 ByteString 流。

引用我的另一个问题"Combining multiple Sources/Producers into one" , 我能够使用 ZipSink 获得 (StdGen, ByteString) 的来源, sourceFile以及生成无限 StdGen 流的自定义源。

我想要实现的是将每个 StdGen 与一个字节的 ByteString 配对,但在我当前的实现中,我得到一个 StdGen 与来自 sourceFile 的输入文件的全部内容配对。 .


{-# LANGUAGE NoImplicitPrelude #-}
{-# LANGUAGE RankNTypes #-}
{-# LANGUAGE OverloadedStrings #-}

import System.Random (StdGen(..), split, newStdGen, randomR)
import ClassyPrelude.Conduit as Prelude
import Control.Monad.Trans.Resource (runResourceT, ResourceT(..))
import qualified Data.ByteString as BS
import Data.Conduit.Binary (isolate)

-- generate a infinite source of random number seeds
sourceStdGen :: MonadIO m => Source m StdGen
sourceStdGen = do
    g <- liftIO newStdGen
    loop g
    where loop gin = do
            let g' = fst (split gin)
            yield gin
            loop g'

-- combine the sources into one
sourceInput :: (MonadResource m, MonadIO m) => FilePath -> Source m (StdGen, ByteString)
sourceInput fp = getZipSource $ (,)
    <$> ZipSource sourceStdGen
    <*> ZipSource (sourceFile fp $= isolate 1)

-- a simple conduit, which generates a random number from provide StdGen
-- and append the byte value to the provided ByteString
simpleConduit :: Conduit (StdGen, ByteString) (ResourceT IO) ByteString
simpleConduit = mapC process 

process :: (StdGen, ByteString) -> ByteString
process (g, bs) =
    let rnd = fst $ randomR (40,50) g
    in bs ++ pack [rnd]

main :: IO ()
main = do
    runResourceT $ sourceInput "test.txt" $$ simpleConduit =$ sinkFile "output.txt"

在管道方面,我认为 isolate会做一个 await , 产生 head传入的 ByteString 流,和 leftOver其余的(将其放回传入流的队列中)。基本上,我要做的是将传入的 ByteString 流分成字节 block 。

我是否正确使用它?如果isolate不是我应该使用的功能,那么任何人都可以提供另一个将其拆分为任意字节 block 的功能吗?



import System.Random (StdGen, split, newStdGen, randomR)
import qualified Data.ByteString as BS
import Data.Conduit 
import Data.ByteString (ByteString, pack, unpack, singleton)
import Control.Monad.Trans (MonadIO (..))
import Data.List (unfoldr)
import qualified Data.Conduit.List as L
import Data.Monoid ((<>))

input :: MonadIO m => FilePath -> Source m (StdGen, ByteString)
input path = do 
  gs <- unfoldr (Just . split) `fmap` liftIO newStdGen 
  bs <- (map singleton . unpack) `fmap` liftIO (BS.readFile path)
  mapM_ yield (zip gs bs)

output :: Monad m => Sink (StdGen, ByteString) m ByteString
output = L.foldMap (\(g, bs) -> let rnd = fst $ randomR (97,122) g in bs <> pack [rnd])

main :: IO ()
main = (input "in.txt" $$ output) >>=  BS.writeFile "out.txt"

省略 map singleton 可能更高效,你也可以直接使用 Word8 并在结束。

关于haskell - Conduit - 将 ByteString 源拆分为字节 block ,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/23363290/


