我需要从更长的列表中随机抽取一个样本而不进行替换(每个元素在样本中只出现一次)。我正在使用下面的代码,但现在我想知道:
抽样的目的是能够将分析样本的结果推广到总体。
import System.Random
-- | Take a random sample without replacement of size size from a list.
takeRandomSample :: Int -> Int -> [a] -> [a]
takeRandomSample seed size xs
| size < hi = subset xs rs
| otherwise = error "Sample size must be smaller than population."
where
rs = randomSample seed size lo hi
lo = 0
hi = length xs - 1
getOneRandomV g lo hi = randomR (lo, hi) g
rsHelper size lo hi g x acc
| x `notElem` acc && length acc < size = rsHelper size lo hi new_g new_x (x:acc)
| x `elem` acc && length acc < size = rsHelper size lo hi new_g new_x acc
| otherwise = acc
where (new_x, new_g) = getOneRandomV g lo hi
-- | Get a random sample without replacement of size size between lo and hi.
randomSample seed size lo hi = rsHelper size lo hi g x [] where
(x, g) = getOneRandomV (mkStdGen seed) lo hi
subset l = map (l !!)
最佳答案
这是 Daniel Fischer 在评论中建议的快速“粗略”实现,使用我首选的 PRNG(mwc-random):
{-# LANGUAGE BangPatterns #-}
module Sample (sample) where
import Control.Monad.Primitive
import Data.Foldable (toList)
import qualified Data.Sequence as Seq
import System.Random.MWC
sample :: PrimMonad m => [a] -> Int -> Gen (PrimState m) -> m [a]
sample ys size = go 0 (l - 1) (Seq.fromList ys) where
l = length ys
go !n !i xs g | n >= size = return $! (toList . Seq.drop (l - size)) xs
| otherwise = do
j <- uniformR (0, i) g
let toI = xs `Seq.index` j
toJ = xs `Seq.index` i
next = (Seq.update i toI . Seq.update j toJ) xs
go (n + 1) (i - 1) next g
{-# INLINE sample #-}
这几乎是对 R 的内部 C 版本
sample()
的(简洁的)功能重写。因为它被称为没有替换。sample
只是一个递归工作函数的包装器,它递增地打乱总体,直到达到所需的样本大小,只返回那么多打乱的元素。像这样编写函数可确保 GHC 可以内联它。它易于使用:
*Main> create >>= sample [1..100] 10
[51,94,58,3,91,70,19,65,24,53]
生产版本可能希望使用可变向量之类的东西,而不是
Data.Sequence
。为了减少花费在 GC 上的时间。
关于list - 从 Haskell 中的列表中获取随机样本而不替换的更好方法,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/13779630/