我有一个 100K+ 元素的列表,这是一个有限列表。目前我正在使用 Data.List 函数 elem。在查看 Data.List 信息页面时,还有查找和过滤器。其中之一会比 elem 函数更快吗?
最佳答案
以防万一我们还没有完全击败死马......
不同的集合表示存在巨大的性能差异。作为一个示例(可能与您的用例匹配也可能不匹配),请考虑采用 200K 个随机元素的列表和计算来确定 200 个随机元素的成员资格。
我已经实现了三种明显的方法来做到这一点 - 使用 elem
在列表中,转换为 HashSet
检查成员资格,并执行布隆过滤器和哈希集的混合。基准测试显示列表解决方案比哈希集慢 3 个数量级,比混合慢约 2 倍。
benchmarking list
mean: 460.7106 ms, lb 459.2952 ms, ub 462.8491 ms, ci 0.950
std dev: 8.741096 ms, lb 6.293703 ms, ub 12.23082 ms, ci 0.950
benchmarking hashset
mean: 175.2730 us, lb 173.9140 us, ub 177.0802 us, ci 0.950
std dev: 7.966790 us, lb 6.391454 us, ub 10.25774 us, ci 0.950
benchmarking bloom+hashset
mean: 88.22402 us, lb 87.35856 us, ub 89.66884 us, ci 0.950
std dev: 5.642663 us, lb 3.793715 us, ub 8.264222 us, ci 0.950
和代码:
import qualified Data.HashSet as Set
import Data.HashSet (Set)
import qualified Data.BloomFilter as BF
import qualified Data.BloomFilter.Easy as BF
import Data.BloomFilter (Bloom)
import Data.BloomFilter.Hash as H2
import Data.Hashable as H1
import Criterion.Main
import System.Random
data MySet a = MS (Set a) (Bloom a)
fromList :: (H2.Hashable a, H1.Hashable a, Ord a) => [a] -> MySet a
fromList as =
let hs = Set.fromList as
bf = BF.easyList 0.2 as
in hs `seq` bf `seq` MS hs bf
member :: (H2.Hashable a, H1.Hashable a, Ord a) => a -> MySet a -> Bool
member e (MS hs bf)
| BF.elemB e bf = Set.member e hs
| otherwise = False
main = do
list <- take 200000 `fmap` randomsIO :: IO [Int]
xs <- take 200 `fmap` randomsIO
let hs = Set.fromList list
bhs = fromList list
defaultMain
[ bench "list" $ nf (map (`elem` list)) xs
, bench "hashset" $ nf (map (`Set.member` hs)) xs
, bench "bloom+hashset" $ nf (map (`member` bhs)) xs
]
randomsIO = randoms `fmap` newStdGen
关于list - Haskell:搜索大尺寸列表的最佳方式,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/22638997/