Haskell:如何摆脱内存泄漏

我正在尝试在互联网上找到的“所有可能”技巧，以及 !、seq、deepseq 的“所有可能”组合，...但是我找不到在以下程序中抑制内存泄漏的方法

{-# LANGUAGE BangPatterns #-}

import Data.List
import Control.DeepSeq

foldl'' f z (x:xs) = let z' = f z x in z' `deepseq` foldl'' f z' xs 
foldl'' _ z [] = z

statistics :: [[Double]] -> [(Double, Double)]
statistics (z:zs) = normalize $ foldl'' go acc zs
  where acc = map (\x -> (x, x^2)) z
        go = zipWith (\(a, b) x -> (a + x, b + x^2))
        n  = 1 + (length zs)  
        dn = fromIntegral n
        normalize = map (\(a, b) -> (a / dn, (b - a^2 / dn) / dn))

main =  mapM_ (putStrLn . show) (statistics ps)
   where ps = take nn $ unfoldr (Just . splitAt mm) $ map sin [1..]
         nn = 1000
         mm = 1000

它计算“来自nn测量的mm变量”的均值和方差。

你能给我一个提示吗？

编辑

正如答案中所指出的，问题出在调用length。所以更好的版本可能是

{-# LANGUAGE BangPatterns #-}

import Data.List
import Control.DeepSeq

foldl'' f z (x:xs) = let z' = f z x in z' `deepseq` foldl'' f z' xs 
foldl'' _ z [] = z

statistics :: [[Double]] -> [(Double, Double)]
statistics (z:zs) = normalize $ foldl'' go acc zs
   where acc = (1, map (\x -> (x, x^2)) z)
         go = \(n, abs) xs -> (n + 1, zipWith (\(a, b) x -> (a + x, b + x^2)) abs xs)
         normalize (n, abs) = map (\(a, b) -> (a / n, (b - a^2 / n) / n)) abs

main =  mapM_ (putStrLn . show) (statistics ps)
   where ps = take nn $ unfoldr (Just . splitAt mm) $ map sin [1..]
         nn = 100
         mm = 1000000

但是如果 mm 由于 zipWith 创建的 thunk 很大，它仍然会泄漏。相反，我尝试过

zipWith' f xs ys = go f [] xs ys
   where go f zs (a:as) (b:bs) = 
             let zs' = (f a b) : zs in go f zs' as bs
         go _ zs _ _ = foldl' (\x y -> y : x) [] zs

但没有成功。

编辑

第二个改进示例以及 nn=100 和 mm=1e6 的分析输出:

   leak +RTS -p -h -RTS

total time  =        5.18 secs   (5180 ticks @ 1000 us, 1 processor)
total alloc = 4,769,555,408 bytes  (excludes profiling overheads)

    COST CENTRE       MODULE  %time %alloc

    main              Main     67.4   64.0
    main.ps           Main     18.1   21.6
    statistics.go.\   Main      6.7   11.7
    statistics.go.\.\ Main      5.3    1.9
    foldl''           Main      1.5    0.0


                                               individual     inherited
 COST CENTRE                 no.     entries  %time %alloc   %time %alloc

 MAIN                         46           0    0.0    0.0   100.0  100.0
  main                        94           0   67.4   64.0    67.4   64.0
  CAF                         91           0    0.0    0.0    32.6   36.0
   main                       92           1    0.0    0.0    32.6   36.0
    main.mm                   97           1    0.0    0.0     0.0    0.0
    main.nn                   96           1    0.0    0.0     0.0    0.0
    main.ps                   95           1   18.1   21.6    18.1   21.6
    statistics                93           1    0.0    0.0    14.5   14.4
     statistics.normalize    106           1    0.4    0.4     0.8    0.5
      statistics.normalize.\ 107      100000    0.4    0.1     0.4    0.1
     statistics.acc          102           1    0.2    0.3     0.3    0.3
      statistics.acc.\       104      100000    0.1    0.0     0.1    0.0
     statistics.go           100           1    0.0    0.0     0.0    0.0
     foldl''                  98          30    1.5    0.0    13.5   13.6
      foldl''.z'              99          29    0.0    0.0    12.0   13.6
       statistics.go         101           0    0.0    0.0    12.0   13.6
        statistics.go.\      103          29    6.7   11.7    12.0   13.6
         statistics.go.\.\   105     2900000    5.3    1.9     5.3    1.9

enter image description here

正如 Thomas M. DuBuisson 所说，“泄漏”似乎与 ps 的构造有关，但它还能如何构造呢？

最佳答案

主要是猜测，但是您调用 length xs 来获取 n 会强制输入列表的主干 xs，从而创建大量重击。要进行快速测试，请使用 1000 而不是 dn 并查看是否有帮助。

为了完全懒惰，您可以尝试定义标准化，而不必预先计算列表的长度。但这可能很难实现......

在您编辑的代码中，我相信没有真正的空间泄漏，只是昂贵的数据结构。切换到未装箱向量后，代码运行速度更快，RAM 为 133MB。请注意，除了向函数添加一些 V. 之外，我什么也不做:

{-# LANGUAGE BangPatterns #-}

import Data.List
import Control.DeepSeq
import qualified Data.Vector.Unboxed as V

foldl'' f z (x:xs) = let z' = f z x in z' `deepseq` foldl'' f z' xs 
foldl'' _ z [] = z

statistics :: [V.Vector Double] -> V.Vector (Double, Double)
statistics (z:zs) = normalize $ foldl'' go acc zs
   where acc = (1, V.map (\x -> (x, x^2)) z)
         go = \(n, abs) xs -> (n + 1, V.zipWith (\(a, b) x -> (a + x, b + x^2)) abs xs)
         normalize (n, abs) = V.map (\(a, b) -> (a / n, (b - a^2 / n) / n)) abs

main =  V.mapM_ (putStrLn . show) (statistics ps)
   where ps = take nn $ map V.fromList $ unfoldr (Just . splitAt mm) $ map sin [1..]
         nn = 100
         mm = 1000000

使用 +RTS -s 运行此命令会给出这些统计信息(我在输出运行时按 Ctrl-C，因此执行时间很短):

   3,009,534,792 bytes allocated in the heap
     324,022,224 bytes copied during GC
      51,390,128 bytes maximum residency (16 sample(s))
       4,896,504 bytes maximum slop
             133 MB total memory in use (0 MB lost due to fragmentation)

                                    Tot time (elapsed)  Avg pause  Max pause
  Gen  0      5617 colls,     0 par    0.20s    0.20s     0.0000s    0.0006s
  Gen  1        16 colls,     0 par    0.10s    0.10s     0.0063s    0.0322s

  INIT    time    0.00s  (  0.00s elapsed)
  MUT     time    1.39s  (  5.59s elapsed)
  GC      time    0.30s  (  0.30s elapsed)
  EXIT    time    0.00s  (  0.00s elapsed)
  Total   time    1.70s  (  5.89s elapsed)

  %GC     time      17.8%  (5.1% elapsed)

  Alloc rate    2,163,064,609 bytes per MUT second

  Productivity  82.2% of total user, 23.7% of total elapsed

最大驻留量约为 50MB，这很好地对应于在 V.zipWith 期间同时存在的 1000000 个 double 对的未装箱向量的三个副本。堆上的 50MB 数据和使用中的 133MB 内存之间的差异来自于我们有一个复制垃圾收集器(这使需求加倍)以及可能来自运行时系统或其他组件的一些开销。

关于Haskell:如何摆脱内存泄漏，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/18337833/

Haskell:如何摆脱内存泄漏

上一篇：ruby-on-rails-3 - 设计允许电子邮件在范围内唯一性

下一篇：asp.net-mvc-partialview - mvc中 View 和部分 View 有什么区别