list - 从 Haskell 中的列表中获取随机样本而不替换的更好方法

我需要从更长的列表中随机抽取一个样本而不进行替换(每个元素在样本中只出现一次)。我正在使用下面的代码，但现在我想知道:

是否有执行此操作的库函数？

如何改进此代码？ (我是 Haskell 初学者，所以即使有库函数也很有用)。

抽样的目的是能够将分析样本的结果推广到总体。

import System.Random

-- | Take a random sample without replacement of size size from a list.
takeRandomSample :: Int -> Int -> [a] -> [a]
takeRandomSample seed size xs
    | size < hi  = subset xs rs
    | otherwise = error "Sample size must be smaller than population."
    where
        rs = randomSample seed size lo hi
        lo = 0
        hi = length xs - 1

getOneRandomV g lo hi = randomR (lo, hi) g

rsHelper size lo hi g x acc
    | x `notElem` acc && length acc < size = rsHelper size lo hi new_g new_x (x:acc)
    | x `elem` acc && length acc < size = rsHelper size lo hi new_g new_x acc
    | otherwise = acc
    where (new_x, new_g) = getOneRandomV g lo hi

-- | Get a random sample without replacement of size size between lo and hi.
randomSample seed size lo hi = rsHelper size lo hi g x [] where
(x, g)  = getOneRandomV (mkStdGen seed) lo hi

subset l = map (l !!)

最佳答案

这是 Daniel Fischer 在评论中建议的快速“粗略”实现，使用我首选的 PRNG(mwc-random):

{-# LANGUAGE BangPatterns #-}

module Sample (sample) where

import Control.Monad.Primitive
import Data.Foldable (toList)
import qualified Data.Sequence as Seq
import System.Random.MWC

sample :: PrimMonad m => [a] -> Int -> Gen (PrimState m) -> m [a]
sample ys size = go 0 (l - 1) (Seq.fromList ys) where
    l = length ys
    go !n !i xs g | n >= size = return $! (toList . Seq.drop (l - size)) xs
                  | otherwise = do
                      j <- uniformR (0, i) g
                      let toI  = xs `Seq.index` j
                          toJ  = xs `Seq.index` i
                          next = (Seq.update i toI . Seq.update j toJ) xs
                      go (n + 1) (i - 1) next g
{-# INLINE sample #-}

这几乎是对 R 的内部 C 版本 sample() 的(简洁的)功能重写。因为它被称为没有替换。
sample只是一个递归工作函数的包装器，它递增地打乱总体，直到达到所需的样本大小，只返回那么多打乱的元素。像这样编写函数可确保 GHC 可以内联它。

它易于使用:

*Main> create >>= sample [1..100] 10
[51,94,58,3,91,70,19,65,24,53]

生产版本可能希望使用可变向量之类的东西，而不是 Data.Sequence。为了减少花费在 GC 上的时间。

关于list - 从 Haskell 中的列表中获取随机样本而不替换的更好方法，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/13779630/

list - 从 Haskell 中的列表中获取随机样本而不替换的更好方法

上一篇：filepicker.io - 有没有办法在使用 filepicker.io 上传之前调整相机在 iOs 上拍摄的照片的大小？

下一篇：asp.net-core - UseStatusCodePagesWithRedirects 和 UseStatusCodePagesWithReExecute 之间的区别 - Asp.net core 中的状态代码页