python - Python 在大型数组的 np.std 实现中是否存在错误?

标签 python numpy std variance

我正在尝试通过 np.std(array,ddof = 0) 计算方差。如果我碰巧有一个长增量数组,即数组中的所有值都相同,问题就会出现。它没有返回 std = 0,而是给出了一些小值,这反过来会导致进一步的估计错误。均值正确返回... 示例:

np.std([0.1,0.1,0.1,0.1,0.1,0.1,0.1,0.1,0.1,0.1,0.1,0.1,0.1,0.1,0.1,0.1,0.1,0.1,0.1,0.1,0.1,0.1,0.1,0.1,0.1,0.1,0.1,0.1,0.1,0.1,0.1,0.1,0.1,0.1,0.1,0.1,0.1,0.1,0.1,0.1,0.1,0.1,0.1,0.1,0.1,0.1,0.1,0.1,0.1,0.1,0.1,0.1,0.1,0.1,0.1,0.1,0.1,0.1,0.1,0.1,0.1,0.1,0.1,0.1,0.1,0.1,0.1,0.1,0.1,0.1,0.1,0.1,0.1,0.1,0.1,0.1,0.1,0.1,0.1,0.1,0.1,0.1,0.1,0.1,0.1,0.1,0.1,0.1,0.1,0.1],ddof = 0)

给出 1.80411241502e-16

但是

np.std([0.1,0.1,0.1,0.1,0.1,0.1,0.1,0.1,0.1,0.1,0.1,0.1,0.1,0.1,0.1,0.1,0.1,0.1,0.1,0.1,0.1,0.1,0.1,0.1,0.1,0.1,0.1,0.1,0.1,0.1,0.1,0.1,0.1,0.1,0.1,0.1,0.1,0.1,0.1,0.1,0.1,0.1,0.1,0.1,0.1],ddof = 0)

给出标准 = 0

有没有办法克服这个问题,除了现在根本不计算 std 就检查每次迭代的数据的唯一性?

谢谢

附言在标记为重复 Is floating point math broken? 之后, 复制粘贴 @kxr 关于为什么这是一个不同问题的回复:

“当前的重复标记是错误的。它不仅仅是关于简单的 float 比较,而是关于通过在长数组上使用 np.std 来实现接近零结果的小错误的内部聚合——正如提问者所指出的那样。比较例如 >>> np.std([0.1, 0.1, 0.1, 0.1, 0.1, 0.1]*200000) -> 2.0808632594793153e-12 。所以他可以例如解决: >>> 意思= a.mean(); xmean = round(mean, int(-log10(mean)+9)); std = np.sqrt(((a - xmean) ** 2).sum()/a.size)

问题当然始于 float 表示,但并不止于此。 @kxr - 我很欣赏评论和示例

最佳答案

欢迎来到实用数值算法的世界!在现实生活中,如果你有两个 float xy,检查 x == y 是没有意义的。因此,标准偏差是否为 0 的问题没有任何意义,它接近或不接近。让我们使用 np.isclose 检查一下

import numpy as np

>>> np.isclose(1.80411241502e-16, 0)
True

这就是您所能期望的最好的效果。在现实生活中,您甚至无法像您建议的那样检查所有元素是否相同。它们是 float 吗?它们是由其他一些过程产生的吗?他们也会有小错误。

关于python - Python 在大型数组的 np.std 实现中是否存在错误?,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/35574234/

相关文章:

python - 如何过滤数据框中 2019、2020 和 2021 年的列值?

python - 对 ModelForm 进行单元测试时出现错误

python - Tensorflow 磁贴抛出 TypeError : List of Tensors when single Tensor expected

c++ - 'powf' 不是 'std' 的成员

c++ - 为什么这个程序有这种行为(push_back)?

Python读取文件

python - 提高 numpy 映射操作的性能

python - Numpy - 在另一个二维数组中查找二维数组的最有效方法

python - 如何将 numpy 数组从 (128,128,3) 更改为 (3,128,128)?

c++ - 将字符串数据附加到 std::vector<std::byte>>