我有以下 Haskell 代码,实现了“cat”unix 命令行实用程序的简单版本。在 400MB 文件上使用“时间”测试性能,速度大约慢 3 倍。 (我用来测试它的确切脚本位于代码下方)。
我的问题是:
关于问题 2 和 3:我使用了 GHC -prof,然后使用 +RTS -p 运行,但我发现这里的输出有点缺乏信息。
来源(Main.hs)
module Main where
import System.IO
import System.Environment
import Data.ByteString as BS
import Control.Monad
-- Copied from cat source code
bufsize = 1024*128
go handle buf = do
hPut stdout buf
eof <- hIsEOF handle
unless eof $ do
buf <- hGetSome handle bufsize
go handle buf
main = do
file <- fmap Prelude.head getArgs
handle <- openFile file ReadMode
buf <- hGetSome handle bufsize
hSetBuffering stdin $ BlockBuffering (Just bufsize)
hSetBuffering stdout $ BlockBuffering (Just bufsize)
go handle buf
计时脚本(run.sh):
#!/usr/bin/env bash
# Generate 10M lines of silly test data
yes aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa | head -n 10000000 > huge
# Compile with optimisation
ghc -O2 Main.hs
# Run haskell
echo "timing Haskell"
time ./Main huge > /dev/null
echo ""
echo ""
# Run cat
echo "timing 'cat'"
time cat huge > /dev/null
我的结果:
timing Haskell
real 0m0.980s
user 0m0.296s
sys 0m0.684s
timing 'cat'
real 0m0.304s
user 0m0.001s
sys 0m0.302s
使用 -prof 编译并使用 +RTS -p 运行时的分析报告如下:
Sat Dec 13 21:26 2014 Time and Allocation Profiling Report (Final)
Main +RTS -p -RTS huge
total time = 0.92 secs (922 ticks @ 1000 us, 1 processor)
total alloc = 7,258,596,176 bytes (excludes profiling overheads)
COST CENTRE MODULE %time %alloc
MAIN MAIN 100.0 100.0
individual inherited
COST CENTRE MODULE no. entries %time %alloc %time %alloc
MAIN MAIN 46 0 100.0 100.0 100.0 100.0
CAF GHC.Conc.Signal 84 0 0.0 0.0 0.0 0.0
CAF GHC.IO.FD 82 0 0.0 0.0 0.0 0.0
CAF GHC.IO.Handle.FD 81 0 0.0 0.0 0.0 0.0
CAF System.Posix.Internals 76 0 0.0 0.0 0.0 0.0
CAF GHC.IO.Encoding 70 0 0.0 0.0 0.0 0.0
CAF GHC.IO.Encoding.Iconv 69 0 0.0 0.0 0.0 0.0
最佳答案
这只是试图解决第二个问题的部分答案:
我使用 GHC.IO.Buffer
尝试了类似的操作接口(interface):
module Main where
import System.IO
import System.Environment
import GHC.IO.Buffer
import Data.ByteString as BS
import Control.Monad
-- Copied from cat source code
bufsize = 1024*128
go handle bufPtr = do
read <- hGetBuf handle bufPtr bufsize
when (read > 0) $ do
hPutBuf stdout bufPtr read
go handle bufPtr
main = do
file <- fmap Prelude.head getArgs
handle <- openFile file ReadMode
buf <- newByteBuffer bufsize WriteBuffer
withBuffer buf $ go handle
它似乎更接近“猫”的表现,但仍然肯定更慢......
time ./Cat huge > /dev/null
./Cat huge > /dev/null 0.00s user 0.06s system 76% cpu 0.081 total
time cat huge > /dev/null
cat huge > /dev/null 0.00s user 0.05s system 75% cpu 0.063 total
我认为使用缓冲区 API,我们可以在使用
hGetSome
时显式避免分配所有缓冲区字节串。就像在原始代码中一样,但我只是在这里猜测并且不知道两个编译代码中到底发生了什么......更新:在我的笔记本电脑上添加原始代码的性能:
time ./Cat2 huge > /dev/null
./Cat2 huge > /dev/null 0.12s user 0.10s system 99% cpu 0.219 total
更新 2:添加一些基本的分析结果:
原始代码:
Cat2 +RTS -p -RTS huge
total time = 0.21 secs (211 ticks @ 1000 us, 1 processor)
total alloc = 6,954,068,112 bytes (excludes profiling overheads)
COST CENTRE MODULE %time %alloc
MAIN MAIN 100.0 100.0
individual inherited
COST CENTRE MODULE no. entries %time %alloc %time %alloc
MAIN MAIN 46 0 100.0 100.0 100.0 100.0
CAF GHC.IO.Handle.FD 86 0 0.0 0.0 0.0 0.0
CAF GHC.Conc.Signal 82 0 0.0 0.0 0.0 0.0
CAF GHC.IO.Encoding 80 0 0.0 0.0 0.0 0.0
CAF GHC.IO.FD 79 0 0.0 0.0 0.0 0.0
CAF System.Posix.Internals 75 0 0.0 0.0 0.0 0.0
CAF GHC.IO.Encoding.Iconv 72 0 0.0 0.0 0.0 0.0
缓冲区 API 代码:
Cat +RTS -p -RTS huge
total time = 0.06 secs (61 ticks @ 1000 us, 1 processor)
total alloc = 3,487,712 bytes (excludes profiling overheads)
COST CENTRE MODULE %time %alloc
MAIN MAIN 100.0 98.9
individual inherited
COST CENTRE MODULE no. entries %time %alloc %time %alloc
MAIN MAIN 44 0 100.0 98.9 100.0 100.0
CAF GHC.IO.Handle.FD 85 0 0.0 1.0 0.0 1.0
CAF GHC.Conc.Signal 82 0 0.0 0.0 0.0 0.0
CAF GHC.IO.Encoding 80 0 0.0 0.1 0.0 0.1
CAF GHC.IO.FD 79 0 0.0 0.0 0.0 0.0
CAF GHC.IO.Encoding.Iconv 71 0 0.0 0.0 0.0 0.0
特别注意分配成本的巨大差异......
关于performance - 使用 Data.ByteString 实现 unix 的 "cat"程序的 Haskell 性能,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/27463669/