performance - 为什么基于 Haskell 枚举器的 IO 如此频繁地调用 sigprocmask?

标签 performance haskell ghc

修订摘要

好吧,看起来系统调用肯定与 GC 有关,而根本问题只是 GC 发生得太频繁了。这似乎与 splitWhen 和 pack 的使用有关,我可以通过分析来判断。

splitWhen's implementation将每个 block 从惰性文本转换为严格文本,并将它们全部连接起来,因为它建立了一个 block 缓冲区。这势必会分配很多。

pack,因为它正在从一种类型转换为另一种类型,所以必须分配,这在我的内部循环中,所以这也是有道理的。

原始问题

我在基于 haskell 枚举器的 IO 中偶然发现了一些令人惊讶的系统调用事件。希望有人能对此有所了解。

我一直在玩弄我曾经写了几个月的快速 perl 脚本的 haskell 版本,时断时续。该脚本从每一行读取一些 json,然后打印出一个特定字段(如果存在)。

这是 perl 版本,以及我如何运行它。

cat ~/sample_input | perl -lpe '($_) = grep(/type/, split(/,/))' > /dev/null

这是 haskell 版本(它的调用方式与 perl 版本类似)。
{-# LANGUAGE OverloadedStrings #-}
import qualified Data.Enumerator as E
import qualified Data.Enumerator.Internal as EI
import qualified Data.Enumerator.Text as ET
import qualified Data.Enumerator.List as EL
import qualified Data.Text as T
import qualified Data.Text.IO as TI
import Data.Functor
import Control.Monad
import qualified Data.Text.Lazy as TL
import qualified Data.Text.Lazy.IO as TLI
import System.Environment
import System.IO (stdin, stdout)
import GHC.IO.Handle (hSetBuffering, BufferMode(BlockBuffering))

fieldEnumerator field = enumStdin E.$= splitOn [',','\n'] E.$= grabField field

enumStdin = ET.enumHandle stdin

splitOn :: [Char] -> EI.Enumeratee T.Text T.Text IO b
splitOn chars = (ET.splitWhen (`elem` chars))

grabField :: String -> EI.Enumeratee T.Text T.Text IO b
grabField = EL.filter . T.isInfixOf . T.pack

intercalateNewlines = EL.mapM_ (\field -> (TI.putStrLn field >> (putStr "\n\n")))

runE enum = E.run_ $ enum E.$$ intercalateNewlines

main = do
  (field:_) <- getArgs
  runE $ fieldEnumerator field

令人惊讶的是,haskell 版本的跟踪看起来像这样(实际的 JSON 被抑制,因为它是来自工作的数据),而 perl 版本符合我的预期;一堆读取,然后是写入,重复。
55333/0x8816f5:    366125       3      0 sigprocmask(0x1, 0x10069BFA8, 0x10069BFAC)      = 0x0 0
55333/0x8816f5:    366136       3      0 sigprocmask(0x3, 0x10069BFAC, 0x0)      = 0x0 0
55333/0x8816f5:    367209       3      0 sigprocmask(0x1, 0x10069BFA8, 0x10069BFAC)      = 0x0 0
55333/0x8816f5:    367218       2      0 sigprocmask(0x3, 0x10069BFAC, 0x0)      = 0x0 0
55333/0x8816f5:    368449       3      0 sigprocmask(0x1, 0x10069BFA8, 0x10069BFAC)      = 0x0 0
55333/0x8816f5:    368458       2      0 sigprocmask(0x3, 0x10069BFAC, 0x0)      = 0x0 0
55333/0x8816f5:    369525       3      0 sigprocmask(0x1, 0x10069BFA8, 0x10069BFAC)      = 0x0 0
55333/0x8816f5:    369534       2      0 sigprocmask(0x3, 0x10069BFAC, 0x0)      = 0x0 0
55333/0x8816f5:    370610       3      0 sigprocmask(0x1, 0x10069BFA8, 0x10069BFAC)      = 0x0 0
55333/0x8816f5:    370620       2      0 sigprocmask(0x3, 0x10069BFAC, 0x0)      = 0x0 0
55333/0x8816f5:    371735       3      0 sigprocmask(0x1, 0x10069BFA8, 0x10069BFAC)      = 0x0 0
55333/0x8816f5:    371744       2      0 sigprocmask(0x3, 0x10069BFAC, 0x0)      = 0x0 0
55333/0x8816f5:    371798       5      2 select(0x1, 0x7FFF5FBFBA70, 0x7FFF5FBFB9F0, 0x0, 0x7FFF5FBFBAF0)        = 1 0
55333/0x8816f5:    371802       3      1 read(0x0, SOME_JSON, 0x1FA0)      = 8096 0
55333/0x8816f5:    372907       3      0 sigprocmask(0x1, 0x10069BFA8, 0x10069BFAC)      = 0x0 0
55333/0x8816f5:    372918       3      0 sigprocmask(0x3, 0x10069BFAC, 0x0)      = 0x0 0
55333/0x8816f5:    374063       3      0 sigprocmask(0x1, 0x10069BFA8, 0x10069BFAC)      = 0x0 0
55333/0x8816f5:    374072       2      0 sigprocmask(0x3, 0x10069BFAC, 0x0)      = 0x0 0
55333/0x8816f5:    375147       3      0 sigprocmask(0x1, 0x10069BFA8, 0x10069BFAC)      = 0x0 0
55333/0x8816f5:    375156       2      0 sigprocmask(0x3, 0x10069BFAC, 0x0)      = 0x0 0
55333/0x8816f5:    376283       3      0 sigprocmask(0x1, 0x10069BFA8, 0x10069BFAC)      = 0x0 0
55333/0x8816f5:    376292       2      0 sigprocmask(0x3, 0x10069BFAC, 0x0)      = 0x0 0
55333/0x8816f5:    376809       6      2 select(0x1, 0x7FFF5FBFBA70, 0x7FFF5FBFB9F0, 0x0, 0x7FFF5FBFBAF0)        = 1 0
55333/0x8816f5:    376814       5      3 read(0x0, SOME_JSON, 0x1FA0)      = 8096 0
55333/0x8816f5:    377378       3      0 sigprocmask(0x1, 0x10069BFA8, 0x10069BFAC)      = 0x0 0
55333/0x8816f5:    377387       3      0 sigprocmask(0x3, 0x10069BFAC, 0x0)      = 0x0 0
55333/0x8816f5:    378537       3      0 sigprocmask(0x1, 0x10069BFA8, 0x10069BFAC)      = 0x0 0
55333/0x8816f5:    378546       2      0 sigprocmask(0x3, 0x10069BFAC, 0x0)      = 0x0 0
55333/0x8816f5:    379598       3      0 sigprocmask(0x1, 0x10069BFA8, 0x10069BFAC)      = 0x0 0
55333/0x8816f5:    379604       3      0 sigreturn(0x7FFF5FBFF9A0, 0x1E, 0x1)        = 0 Err#-2
55333/0x8816f5:    379613       2      0 sigprocmask(0x3, 0x10069BFAC, 0x0)      = 0x0 0
55333/0x8816f5:    380667       3      0 sigprocmask(0x1, 0x10069BFA8, 0x10069BFAC)      = 0x0 0
55333/0x8816f5:    380678       2      0 sigprocmask(0x3, 0x10069BFAC, 0x0)      = 0x0 0
55333/0x8816f5:    381862       3      0 sigprocmask(0x1, 0x10069BFA8, 0x10069BFAC)      = 0x0 0
55333/0x8816f5:    381871       3      0 sigprocmask(0x3, 0x10069BFAC, 0x0)      = 0x0 0
55333/0x8816f5:    382032       6      2 select(0x1, 0x7FFF5FBFBA70, 0x7FFF5FBFB9F0, 0x0, 0x7FFF5FBFBAF0)        = 1 0
55333/0x8816f5:    382036       4      2 read(0x0, SOME_JSON, 0x1FA0)        = 8096 0
55333/0x8816f5:    383064       3      0 sigprocmask(0x1, 0x10069BFA8, 0x10069BFAC)      = 0x0 0
55333/0x8816f5:    383073       2      0 sigprocmask(0x3, 0x10069BFAC, 0x0)      = 0x0 0
55333/0x8816f5:    384118       3      0 sigprocmask(0x1, 0x10069BFA8, 0x10069BFAC)      = 0x0 0
55333/0x8816f5:    384127       2      0 sigprocmask(0x3, 0x10069BFAC, 0x0)      = 0x0 0
55333/0x8816f5:    385206       3      0 sigprocmask(0x1, 0x10069BFA8, 0x10069BFAC)      = 0x0 0
55333/0x8816f5:    385215       2      0 sigprocmask(0x3, 0x10069BFAC, 0x0)      = 0x0 0
55333/0x8816f5:    386348       3      0 sigprocmask(0x1, 0x10069BFA8, 0x10069BFAC)      = 0x0 0
55333/0x8816f5:    386358       3      0 sigprocmask(0x3, 0x10069BFAC, 0x0)      = 0x0 0
55333/0x8816f5:    387468       3      0 sigprocmask(0x1, 0x10069BFA8, 0x10069BFAC)      = 0x0 0
55333/0x8816f5:    387477      11      0 sigprocmask(0x3, 0x10069BFAC, 0x0)      = 0x0 0
55333/0x8816f5:    387614       6      2 select(0x1, 0x7FFF5FBFBA70, 0x7FFF5FBFB9F0, 0x0, 0x7FFF5FBFBAF0)        = 1 0
55333/0x8816f5:    387620       5      3 read(0x0, SOME_JSON, 0x1FA0)        = 8096 0
55333/0x8816f5:    388597       3      0 sigprocmask(0x1, 0x10069BFA8, 0x10069BFAC)      = 0x0 0
55333/0x8816f5:    388606       2      0 sigprocmask(0x3, 0x10069BFAC, 0x0)      = 0x0 0
55333/0x8816f5:    389707       3      0 sigprocmask(0x1, 0x10069BFA8, 0x10069BFAC)      = 0x0 0
55333/0x8816f5:    389716       2      0 sigprocmask(0x3, 0x10069BFAC, 0x0)      = 0x0 0
55333/0x8816f5:    390261       7      3 select(0x2, 0x7FFF5FBFBA70, 0x7FFF5FBFB9F0, 0x0, 0x7FFF5FBFBAF0)        = 1 0
55333/0x8816f5:    390273       6      3 write(0x1, SOME_OUTPUT, 0x1FA0)      = 8096 0

最佳答案

您是否担心分配或(开销来自?)对 sigprocmask 的调用?

如果是前者并且您想使用 enumerator打包这个小改动有助于将 4k 测试集减少约 50%:8MB 的分配减少到 4MB,gen0 GC 从 15 减少到 6。

splitOn :: EI.Enumeratee T.Text T.Text IO b
splitOn = EL.concatMap (T.split fastSplit)

fastSplit :: Char -> Bool
fastSplit ','  = True
fastSplit '\n' = True
fastSplit _    = False

之前(来自 +RTS -sstderr -RTS 的统计数据):
       8,212,680 bytes allocated in the heap
         696,184 bytes copied during GC
         148,656 bytes maximum residency (1 sample(s))
          30,664 bytes maximum slop
               2 MB total memory in use (0 MB lost due to fragmentation)

                                    Tot time (elapsed)  Avg pause  Max pause
  Gen  0        15 colls,     0 par    0.00s    0.00s     0.0001s    0.0005s
  Gen  1         1 colls,     0 par    0.00s    0.00s     0.0010s    0.0010s

After:

       3,838,048 bytes allocated in the heap
         689,592 bytes copied during GC
         148,368 bytes maximum residency (1 sample(s))
          27,040 bytes maximum slop
               2 MB total memory in use (0 MB lost due to fragmentation)

                                    Tot time (elapsed)  Avg pause  Max pause
  Gen  0         6 colls,     0 par    0.00s    0.00s     0.0001s    0.0003s
  Gen  1         1 colls,     0 par    0.00s    0.00s     0.0006s    0.0006s

Which is a pretty reasonable improvement but definitely leaves something to be desired. Rather than kicking enumerator around too much more I took a stab at rewriting it in conduit-0.4.1 just for kicks. It should be equivalent...

import Data.Conduit as C
import qualified Data.Conduit.Binary as Cb
import qualified Data.Conduit.List as Cl
import qualified Data.Conduit.Text as Ct
import qualified Data.Text as T
import qualified Data.Text.IO as TI
import Control.Monad.Trans (MonadIO, liftIO)
import System.Environment
import System.IO (stdin)

grabField :: Monad m => String -> Conduit T.Text m T.Text
grabField = Cl.filter . T.isInfixOf . T.pack

printField :: MonadIO m => T.Text -> m ()
printField field = liftIO $ do
  TI.putStrLn field
  putStr "\n\n"

fastSplit :: Char -> Bool
fastSplit ','  = True
fastSplit '\n' = True
fastSplit _    = False

main :: IO ()
main = do
  field:_ <- getArgs
  runResourceT $ Cb.sourceHandle stdin
              $$ Ct.decode Ct.utf8
              =$ Cl.concatMap (T.split fastSplit)
              =$ grabField field
              =$ Cl.mapM_ printField

...但由于某种原因分配并保留更少的内存:

在堆中分配了 835,688 字节
GC 期间复制了 8,576 个字节
87,200 字节最大驻留(1 个样本)
19,968 字节最大斜率
1 MB 总内存在使用(0 MB 由于碎片丢失)

总时间(经过) 平均暂停 最大暂停
Gen 0 1 colls, 0 par 0.00s 0.00s 0.0000s 0.0000s
Gen 1 1 colls, 0 par 0.00s 0.00s 0.0008s 0.0008s

关于performance - 为什么基于 Haskell 枚举器的 IO 如此频繁地调用 sigprocmask?,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/10273696/

相关文章:

android - Flutter - initState() 中的复杂函数会降低导航和性能

java - Netty 性能

haskell - 代表性双仿函数的不动点

css - 提高 CSS 类开关的性能

c# - 用 C# 测量其他进程的执行时间,奇怪的结果

haskell - 从手指树文章中找到丢失的 'Reduce' 类型类

Haskell 类型签名和 Monad

haskell - 如何在最终的无标签方法中重新解释 DSL 术语?

haskell - ghc armv7 二进制 + cabal ?非法指令

haskell - 为什么这个 Haskell 程序在优化编译时会泄漏空间?