sorting - Haskell 中的快速排序

标签 sorting haskell quicksort

阅读堆栈溢出问题后 Using vectors for performance improvement in Haskell描述快速就地 quicksort在 Haskell 中,我给自己设定了两个目标:

  • 实现中位数为 3 的相同算法,以避免在预排序向量上表现不佳;
  • 制作并行版本。

  • 这是结果(为简单起见,留下了一些小片段):
    import qualified Data.Vector.Unboxed.Mutable as MV
    import qualified Data.Vector.Generic.Mutable as GM
    
    type Vector = MV.IOVector Int
    type Sort = Vector -> IO ()
    
    medianofthreepartition :: Vector -> Int -> IO Int
    medianofthreepartition uv li = do
        p1 <- MV.unsafeRead uv li
        p2 <- MV.unsafeRead uv $ li `div` 2
        p3 <- MV.unsafeRead uv 0
        let p = median p1 p2 p3
        GM.unstablePartition (< p) uv
    
    vquicksort :: Sort
    vquicksort uv = do
        let li = MV.length uv - 1
        j <- medianofthreepartition uv li
        when (j > 1) (vquicksort (MV.unsafeSlice 0 j uv))
        when (j + 1 < li) (vquicksort (MV.unsafeSlice (j+1) (li-j) uv))
    
    vparquicksort :: Sort
    vparquicksort uv = do
        let li = MV.length uv - 1
        j <- medianofthreepartition uv li
        t1 <- tryfork (j > 1) (vparquicksort (MV.unsafeSlice 0 j uv))
        t2 <- tryfork (j + 1 < li) (vparquicksort (MV.unsafeSlice (j+1) (li-j) uv))
        wait t1
        wait t2
    
    tryfork :: Bool -> IO () -> IO (Maybe (MVar ()))
    tryfork False _ = return Nothing
    tryfork True action = do
      done <- newEmptyMVar :: IO (MVar ())
      _ <- forkFinally action (\_ -> putMVar done ())
      return $ Just done
    
    wait :: Maybe (MVar ()) -> IO ()
    wait Nothing = return ()
    wait (Just done) = swapMVar done ()
    
    median :: Int -> Int -> Int -> Int
    median a b c
            | a > b =
                    if b > c then b
                            else if a > c then c
                                    else a
            | otherwise =
                    if a > c then a
                            else if b > c then c
                                    else b
    

    对于具有 1,000,000 个元素的向量,我得到以下结果:
    "Number of threads: 4"
    
    "**** Parallel ****"
    "Testing sort with length: 1000000"
    "Creating vector"
    "Printing vector"
    "Sorting random vector"
    CPU time:  12.30 s
    "Sorting ordered vector"
    CPU time:   9.44 s
    
    "**** Single thread ****"
    "Testing sort with length: 1000000"
    "Creating vector"
    "Printing vector"
    "Sorting random vector"
    CPU time:   0.27 s
    "Sorting ordered vector"
    CPU time:   0.39 s
    

    我的问题是:
  • 为什么预排序向量的性能仍然下降?
  • 为什么使用 forkIO 和四线程无法提高性能?
  • 最佳答案

    更好的办法是使用 Control.Parallel.Strategies并行化快速排序。使用这种方法,您不会为每个可以并行执行的代码创建昂贵的线程。您还可以创建纯计算而不是 IO。

    然后你必须根据你拥有的核心数量进行编译:
    http://www.haskell.org/ghc/docs/latest/html/users_guide/using-concurrent.html

    例如,看看 Jim Apple 编写的这个简单的列表快速排序:

    import Data.HashTable as H
    import Data.Array.IO
    import Control.Parallel.Strategies
    import Control.Monad
    import System
    
    exch a i r =
        do tmpi <- readArray a i
           tmpr <- readArray a r
           writeArray a i tmpr
           writeArray a i tmpi
    
    bool a b c = if c then a else b
    
    quicksort arr l r =
      if r <= l then return () else do
        i <- loop (l-1) r =<< readArray arr r
        exch arr i r
        withStrategy rpar $ quicksort arr l (i-1)
        quicksort arr (i+1) r
      where
        loop i j v = do
          (i', j') <- liftM2 (,) (find (>=v) (+1) (i+1)) (find (<=v) (subtract 1) (j-1))
          if (i' < j') then exch arr i' j' >> loop i' j' v
                       else return i'
        find p f i = if i == l then return i
                     else bool (return i) (find p f (f i)) . p =<< readArray arr i
    
    main = 
        do [testSize] <- fmap (fmap read) getArgs
           arr <- testPar testSize
           ans <- readArray arr  (testSize `div` 2)
           print ans
    
    testPar testSize =
        do x <- testArray testSize
           quicksort x 0 (testSize - 1)
           return x
    
    testArray :: Int -> IO (IOArray Int Double)
    testArray testSize = 
        do ans <- newListArray (0,testSize-1) [fromIntegral $ H.hashString $ show i | i <- [1..testSize]]
           return ans
    

    关于sorting - Haskell 中的快速排序,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/17912348/

    相关文章:

    sorting - powershell 排序多维数组

    mysql - 用序列号mysql更新列

    haskell - Haskell 的计算机视觉库

    algorithm - 为什么是这个成本?

    C 这是什么排序算法?

    c - C中链表的快速排序

    c++ - 当 std::sort 中的比较函数总是返回 true 时,为什么会出现运行时错误?

    database - 插入记录时建立索引与不建立索引

    Haskell HDBC-ODBC 内存泄漏?

    haskell - 如何在 Haskell 中持久保存大数据以进行高效反序列化