haskell - Haskell 中的 Map.toList 性能

标签 haskell benchmarking bucket-sort

在下面的代码中,我对桶排序的实现进行了基准测试。

bucketsort 函数使用 _bucketsort 的结果,但将其展平为单个列表。令我惊讶的是这个过程(Map.toList)需要很多时间。

module Main where
import System.Random
import Criterion.Main
import qualified Data.List as List
import qualified Data.Map as Map
import Data.Maybe

insert :: (Ord a) => a -> [a] -> [a]
insert x [] = [x]
insert x (y:xs)
    | x <= y    = x:y:xs
    | otherwise = y : insert x xs

bucketsort :: (Integral a) => [a] -> [a]
bucketsort xs = List.concatMap (snd) . Map.toList $ _bucketsort xs Map.empty

_bucketsort :: (Integral k) => [k] -> Map.Map k [k] -> Map.Map k [k]
_bucketsort [] map = map
_bucketsort (x:xs) map =
    let bucket = x `div` 3
        bucketlist = maybeToList $ Map.lookup bucket map
        bucketInsert x [] = [x]
        bucketInsert x xs = insert x $ head xs
        ys = bucketInsert x bucketlist
        newMap = Map.insert bucket ys map
    in _bucketsort xs newMap

dataset n = List.take n $ randomRs (0, 9999) (mkStdGen 42)

main = defaultMain [ bench "bucketsort 96080" $ whnf bucketsort ((dataset 96080) :: [Int])
                   , bench "_bucketsort 96080" $ whnf _bucketsort ((dataset 96080):: [Int])]

这是 Criterion 基准测试的输出:

C:\>benchmark_bucketsort.exe
warming up
estimating clock resolution...
mean is 1.353299 us (640001 iterations)
found 1278266 outliers among 639999 samples (199.7%)
  638267 (99.7%) low severe
  639999 (100.0%) high severe
estimating cost of a clock call...
mean is 105.8728 ns (8 iterations)
found 14 outliers among 8 samples (175.0%)
  7 (87.5%) low severe
  7 (87.5%) high severe

benchmarking bucketsort 96080
collecting 100 samples, 1 iterations each, in estimated 24.35308 s
Warning: Couldn't open /dev/urandom
Warning: using system clock for seed instead (quality will be lower)
mean: 187.2037 ms, lb 182.7181 ms, ub 191.3842 ms, ci 0.950
std dev: 22.15054 ms, lb 19.47241 ms, ub 25.64983 ms, ci 0.950
variance introduced by outliers: 84.194%
variance is severely inflated by outliers

benchmarking _bucketsort 96080
mean: 8.823789 ns, lb 8.654692 ns, ub 9.049314 ns, ci 0.950
std dev: 952.9240 ps, lb 723.0241 ps, ub 1.154097 ns, ci 0.950
found 13 outliers among 100 samples (13.0%)
  13 (13.0%) high severe
variance introduced by outliers: 82.077%
variance is severely inflated by outliers

如果我的 bucketsort 函数可以写得更好并且更快,我不会感到惊讶。但到目前为止我还没有弄清楚如何。

此外,非常欢迎对我的 Haskell 代码进行任何改进/评论。

最佳答案

您没有在第二个基准测试中完全应用 _bucketsort,因此只是评估 WHNF 的部分应用函数,这毫不奇怪,速度相当快。

将相关行更改为

main = defaultMain [ bench "bucketsort 96080"  $ whnf bucketsort ((dataset 96080) :: [Int])
                   , bench "_bucketsort 96080" $ whnf (flip _bucketsort Map.empty) ((dataset 96080):: [Int])]

产量(在我的机器上):

warming up
estimating clock resolution...
mean is 2.357120 us (320001 iterations)
found 2630 outliers among 319999 samples (0.8%)
  2427 (0.8%) high severe
estimating cost of a clock call...
mean is 666.7750 ns (14 iterations)
found 1 outliers among 14 samples (7.1%)
  1 (7.1%) high severe

benchmarking bucketsort 96080
collecting 100 samples, 1 iterations each, in estimated 34.66980 s
mean: 244.3280 ms, lb 238.0601 ms, ub 250.6725 ms, ci 0.950
std dev: 32.37658 ms, lb 28.02356 ms, ub 38.10187 ms, ci 0.950
found 3 outliers among 100 samples (3.0%)
  3 (3.0%) low mild
variance introduced by outliers: 87.311%
variance is severely inflated by outliers

benchmarking _bucketsort 96080
collecting 100 samples, 1 iterations each, in estimated 24.65911 s
mean: 244.9425 ms, lb 239.1011 ms, ub 251.0300 ms, ci 0.950
std dev: 30.68877 ms, lb 26.48151 ms, ub 36.20961 ms, ci 0.950
variance introduced by outliers: 86.247%
variance is severely inflated by outliers

此外请注意,此基准测试并未完全强制列表,因为列表上的 whnf 只会评估顶级构造函数。这解释了为什么两个基准现在具有几乎相同的性能。将两个基准测试切换为 nf 将时间分别更改为 369.3022ms 和 354.3513ms,使 bucketsort 再次变慢。

关于haskell - Haskell 中的 Map.toList 性能,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/15244994/

相关文章:

haskell - 为什么 TypeSynonymInstances 不允许在实例头中使用部分应用的类型同义词?

haskell - 为什么 State 需要一个值?

docker - fio:对于数据集, block 大小太大

sorting - 我什么时候应该选择桶排序而不是其他排序算法?

list - 使用特定索引在 haskell 中切片列表

haskell - 逼近haskell中的导数时的令人费解的行为

go - 使用 Goroutines 进行基准测试

算法中的 C# 性能波动

java - 二维桶排序问题