使用 random.sample() 时出现 Python 内存错误

好的，我确实遇到了一个需要帮助的问题。

我的程序从 pdb 文件中读取值并将这些值存储在 (array = []) 中，然后从存储值的排列中取出 4 的每个组合并将其存储在名为 maxcoorlist 的列表中。因为组合列表的数量如此之大，为了加快速度，我只想从这个组合列表中抽取 1000-10000 个样本。然而，这样做时，我在获取随机样本的那一行遇到了内存错误。

MemoryError                               Traceback (most recent call last)
<ipython-input-14-18438997b8c9> in <module>()
     77     maxcoorlist= itertools.combinations(array,4)
     78     random.seed(10)
---> 79     volumesample= random_sample(list(maxcoorlist), 1000)
     80     vol_list= [side(i) for i in volumesample]
     81     maxcoor=max(vol_list)

MemoryError:

在这段代码中使用 random.seed() 也很重要，因为我将使用种子获取其他样本。

最佳答案

正如其他答案中提到的，list() 调用会耗尽内存。

相反，首先迭代 maxcoorlist 以找出其长度。然后创建[0, length)范围内的随机数，并将它们添加到索引集，直到索引集的长度为1000。

然后再次迭代 maxcoorlist，如果当前索引在您的索引集中，则将当前值添加到样本集中。

编辑

一种优化是直接计算 maxcoorlist 的长度，而不是迭代它:

import math
n = len(array)
r = 4
length = math.factorial(n) / math.factorial(r) / math.factorial(n-r)

关于使用 random.sample() 时出现 Python 内存错误，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/17935050/

上一篇：python - 如何有选择地将附加子字符串与 Python 正则表达式匹配？

下一篇：python - 如何选择列而不在 group by 子句中放置非聚合列

相关文章：

c++ - 执行代码 x 百分比的时间

c++ - 为什么 rand() 在以 1 和 UINT_MAX 为种子时产生相同的值？

Solr/Lucene fieldCache OutOfMemory 对动态字段的错误排序

java - 查找 JVM 崩溃的原因

python - 在 Python 中测试 flash 消息的内容

Python/Scipy 二维插值(非均匀数据)

javascript - 我的 JavaScript 时钟小部件使 Firefox 崩溃

bash - 从 stdout 中随机选择一行

python - 使用 utf-8 编码后仍出现 ascii 代码错误

python - 为什么我的 Iterative Deepening Depth-First Search 实现占用的内存与 BFS 一样多？