Haskell Lazy ByteString + 读/写进度函数

我正在学习 Haskell Lazy IO。

我正在寻找一种优雅的方式来复制大文件(8Gb)，同时将复制进度打印到控制台。

考虑以下以静默方式复制文件的简单程序。

module Main where

import System
import qualified Data.ByteString.Lazy as B

main = do [from, to] <- getArgs
          body <- B.readFile from
          B.writeFile to body

想象有一个回调函数要用于报告:

onReadBytes :: Integer -> IO ()
onReadBytes count = putStrLn $ "Bytes read: " ++ (show count)

问题:如何将 onReadBytes 函数编织到 Lazy ByteString 中，以便在成功读取时回调它？或者如果这个设计不好，那么 Haskell 的方法是什么？

注意:回调的频率并不重要，可以每 1024 字节或每 1 Mb 调用一次——不重要

答:非常感谢 camccann 的回答。我建议完全阅读它。

Bellow 是我基于 camccann 代码的代码版本，您可能会发现它很有用。

module Main where

import System
import System.IO
import qualified Data.ByteString.Lazy as B

main = do [from, to] <- getArgs
          withFile from ReadMode $ \fromH ->
            withFile to WriteMode $ \toH ->
              copyH fromH toH $ \x -> putStrLn $ "Bytes copied: " ++ show x

copyH :: Handle -> Handle -> (Integer -> IO()) -> IO ()
copyH fromH toH onProgress =
    copy (B.hGet fromH (256 * 1024)) (write toH) B.null onProgress
    where write o x  = do B.hPut o x
                          return . fromIntegral $ B.length x

copy :: (Monad m) => m a -> (a -> m Integer) -> (a -> Bool) -> (Integer -> m()) -> m()
copy = copy_ 0

copy_ :: (Monad m) => Integer -> m a -> (a -> m Integer) -> (a -> Bool) -> (Integer -> m()) -> m()
copy_ count inp outp done onProgress = do x <- inp
                                          unless (done x) $
                                            do n <- outp x
                                               onProgress (n + count)
                                               copy_ (n + count) inp outp done onProgress

最佳答案

首先，我想指出，相当多的 Haskell 程序员通常对惰性 IO 持怀疑态度。它在技术上违反了纯度，但是在以一致的输入[0]运行单个程序时以有限的方式(据我所知)并不明显。另一方面，很多人都可以接受它，因为它只涉及一种非常有限的杂质。

要创建实际使用按需 I/O 创建的惰性数据结构的假象，函数如 readFile是在幕后使用鬼鬼祟祟的恶作剧来实现的。在按需 I/O 中编织是该功能所固有的，并且由于与获得常规 ByteString 的错觉几乎相同的原因，它并不是真正可扩展的。从它是令人信服的。

挥动细节并编写伪代码，像 readFile 这样的东西基本上是这样工作的:

lazyInput inp = lazyIO (lazyInput' inp)
lazyInput' inp = do x <- readFrom inp
                    if (endOfInput inp)
                        then return []
                        else do xs <- lazyInput inp
                                return (x:xs)

...每次在哪里 lazyIO被调用时，它会延迟 I/O 直到实际使用该值。要在每次实际读取发生时调用您的报告函数，您需要直接将其编织进去，虽然可以编写此类函数的通用版本，但据我所知不存在。

鉴于上述情况，您有几个选择:

查找您正在使用的惰性 I/O 函数的实现，并实现您自己的，包括进度报告功能。如果这感觉像是一个肮脏的黑客，那是因为它几乎是，但你去。

放弃惰性 I/O 并切换到更明确和可组合的东西。这是整个 Haskell 社区似乎正朝着的方向，特别是在 Iteratees 上的一些变化。，它为您提供了具有更多可预测行为的可组合的小型流处理器构建 block 。缺点是该概念仍在积极开发中，因此没有就实现或学习使用它们的单一起点达成共识。

放弃惰性 I/O 并切换到普通的旧常规 I/O:编写 IO读取 block 、打印报告信息并处理尽可能多的输入的操作；然后循环调用它直到完成。根据您对输入所做的操作以及您在处理过程中对惰性的依赖程度，这可能涉及从编写几个几乎微不足道的函数到构建一堆有限状态机流处理器并获得 90 % 重新发明 Iteratees 的方法。

[0] : 这里的底层函数叫做 unsafeInterleaveIO ，并且据我所知，从中观察杂质的唯一方法需要在不同的输入上运行程序(在这种情况下，无论如何它都有权表现不同，它可能只是以没有意义的方式这样做纯代码)，或以某些方式更改代码(即，应该没有影响的重构可能会产生非局部影响)。

这是一个使用更多可组合函数以“普通的旧常规 I/O”方式做事的粗略示例:

import System
import System.IO
import qualified Data.ByteString.Lazy as B

main = do [from, to] <- getArgs
          -- withFile closes the handle for us after the action completes
          withFile from ReadMode $ \inH ->
            withFile to WriteMode $ \outH ->
                -- run the loop with the appropriate actions
                runloop (B.hGet inH 128) (processBytes outH) B.null

-- note the very generic type; this is useful, because it proves that the
-- runloop function can only execute what it's given, not do anything else
-- behind our backs.
runloop :: (Monad m) => m a -> (a -> m ()) -> (a -> Bool) -> m ()
runloop inp outp done = do x <- inp
                           if done x
                             then return ()
                             else do outp x
                                     runloop inp outp done

-- write the output and report progress to stdout. note that this can be easily
-- modified, or composed with other output functions.
processBytes :: Handle -> B.ByteString -> IO ()
processBytes h bs | B.null bs = return ()
                  | otherwise = do onReadBytes (fromIntegral $ B.length bs)
                                   B.hPut h bs

onReadBytes :: Integer -> IO ()
onReadBytes count = putStrLn $ "Bytes read: " ++ (show count)

上面的“128”表示一次读取多少字节。在我的“堆栈溢出片段”目录中的随机源文件上运行它:

$ runhaskell ReadBStr.hs Corec.hs temp
Bytes read: 128
Bytes read: 128
Bytes read: 128
Bytes read: 128
Bytes read: 128
Bytes read: 128
Bytes read: 128
Bytes read: 128
Bytes read: 128
Bytes read: 128
Bytes read: 83
$

关于Haskell Lazy ByteString + 读/写进度函数，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/6668716/

Haskell Lazy ByteString + 读/写进度函数

上一篇：performance - hibernate 水化性能

下一篇：.net - C++/CLI : Catching all (. NET/Win32/CRT) 异常