python - 比 numpy.where 更节省内存的选项？

我有一个大数组(几百万个元素)，我需要根据几个不同的标准切出一小部分(几百个)。我目前正在使用 np.where，按照以下行:

for threshold in np.arange(0,1,.1):
    x=np.random.random(5000000)
    y=np.random.random(5000000)
    z=np.random.random(5000000)
    inds=np.where((x < threshold) & (y > threshold) & (z > threshold) & (z < threshold+0.1))

DoSomeJunk(a[inds], b[inds], c[inds])

然后使用 ipts 从各种数组中提取正确的点。但是，我在那条 np.where 行上得到了 MemoryError。我在其他一些相关帖子中看到 np.where 可能是内存占用和复制数据。

里面有多个&是不是意味着数据被复制了多次？有没有一种更有效的方式来切片数据，这种方式占用的内存更少，同时还保留了我想要的索引列表，以便以后可以在多个地方使用同一个切片？

请注意，我发布的这个示例实际上并没有生成错误，但结构与我所拥有的类似。

最佳答案

在每个条件下，您都将创建一个临时 bool 数组，其大小与 x、y 和 z 相同。要对此进行优化，您可以迭代创建掩码:

for threshold in np.arange(0,1,.1):
    x=np.random.random(5000000)
    y=np.random.random(5000000)
    z=np.random.random(5000000)
    inds = x < threshold
    inds &= y > threshold
    inds &= z > threshold
    inds &= z < threshold+0.1

DoSomeJunk(a[inds], b[inds], c[inds])

对于此示例，这会将内存使用量从 160 MB 减少到 40 MB。

关于python - 比 numpy.where 更节省内存的选项？，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/54909182/

上一篇：python - Pycharm Windows 7 中的表情符号

下一篇：python - Beautiful Soup 返回不完整的 HTML 脚本

相关文章：

c - realloc() 不会返回带有旧值副本的新 block

python - 生成器表达式在 python 内部如何工作？

python - 如何从 GitHub 安装没有 setup.py 的 Python 包

Java-经过多次迭代后，为什么字符串变量不能保存完整的文件名？

python - 机器学习的数据分离

python - 优化 numpy 数组乘法 : * faster than numpy. 点？

c - malloc() 怪异——总是分配 8 个字节？

python - Kivy 布局高度适应子部件的高度

python - Django Fixture Loading 使用什么 JSON 解析器

python - 在 numpy 中使用 1d 与 2d 向量的性能/标准