python - 对 numpy 屏蔽数组的操作给出屏蔽的无效值

来自 numpy 中屏蔽数组的文档 operations on numpy arrays :

The numpy.ma module comes with a specific implementation of most ufuncs. Unary and binary functions that have a validity domain (such as log or divide) return the masked constant whenever the input is masked or falls outside the validity domain: e.g.:

ma.log([-1, 0, 1, 2])
masked_array(data = [-- -- 0.0 0.69314718056],
             mask = [ True  True False False],
       fill_value = 1e+20)

我遇到的问题是，对于我的计算，我需要知道这些无效操作是在哪里产生的。具体来说，我想要这样:

ma.log([-1, 0, 1, 2])
masked_array(data = [np.nan -- 0.0 0.69314718056],
             mask = [ True  True False False],
       fill_value = 1e+20)

冒着这个问题成为对话的风险，我的主要问题是:

什么是获得此 masked_array 的好解决方案，其中计算出的无效值(那些由 fix_invalid “固定”的值，如 np.nan 和 np.inf)不会转换为(并合并)与)掩码值？

我当前的解决方案是计算 masked_array.data 上的函数，然后使用原始掩码重建掩码数组。但是，我正在编写一个应用程序，它将用户的任意函数映射到许多不同的数组上，其中一些被屏蔽，一些则没有，并且我希望避免仅针对屏蔽数组的特殊处理程序。此外，这些数组在 MISSING、NaN 和 Inf 之间存在区别，这一点很重要，因此我不能只使用带有 np.nan 的数组而不是 masked 值。

此外，如果有人对这种行为存在的原因有任何看法，我想知道。在同一个操作中使用它似乎很奇怪，因为对未屏蔽值的操作结果的有效性实际上是用户的责任，用户可以选择使用修复无效函数来“清理” .

此外，如果有人知道 numpy 中缺失值的进展情况，请分享为最旧的 posts从 2011 年到 2012 年，曾有过一场辩论，但从未产生任何结果。

编辑:2017-10-30

添加到 hpaulj 的答案；具有修改域的日志函数的定义会对 numpy 命名空间中日志的行为产生副作用。

In [1]: import numpy as np

In [2]: np.log(np.ma.masked_array([-1,0,1,2],[1,0,0,0]))
/home/salotz/anaconda3/bin/ipython:1: RuntimeWarning: divide by zero encountered in log
  #!/home/salotz/anaconda3/bin/python
/home/salotz/anaconda3/bin/ipython:1: RuntimeWarning: invalid value encountered in log
  #!/home/salotz/anaconda3/bin/python
Out[2]: 
masked_array(data = [-- -- 0.0 0.6931471805599453],
             mask = [ True  True False False],
       fill_value = 1e+20)

In [3]: mylog = np.ma.core._MaskedUnaryOperation(np.core.umath.log)

In [4]: np.log(np.ma.masked_array([-1,0,1,2],[1,0,0,0]))
/home/salotz/anaconda3/bin/ipython:1: RuntimeWarning: divide by zero encountered in log
  #!/home/salotz/anaconda3/bin/python
/home/salotz/anaconda3/bin/ipython:1: RuntimeWarning: invalid value encountered in log
  #!/home/salotz/anaconda3/bin/python
Out[4]: 
masked_array(data = [-- -inf 0.0 0.6931471805599453],
             mask = [ True False False False],
       fill_value = 1e+20)

np.log 现在与 mylog 具有相同的行为，但 np.ma.log 未更改:

In [5]: np.ma.log(np.ma.masked_array([-1,0,1,2],[1,0,0,0]))
Out[5]: 
masked_array(data = [-- -- 0.0 0.6931471805599453],
             mask = [ True  True False False],
       fill_value = 1e+20)

有办法避免这种情况吗？

使用 Python 3.6.2::Anaconda 自定义(64 位) 和 numpy 1.12.1

最佳答案

只是澄清一下这里发生了什么

np.ma.log 对参数运行 np.log，但它捕获警告:

In [26]: np.log([-1,0,1,2])
/usr/local/bin/ipython3:1: RuntimeWarning: divide by zero encountered in log
  #!/usr/bin/python3
/usr/local/bin/ipython3:1: RuntimeWarning: invalid value encountered in log
  #!/usr/bin/python3
Out[26]: array([        nan,        -inf,  0.        ,  0.69314718])

它屏蔽了 nan 和 -inf 值。显然，将原始值复制到这些 data 槽中:

In [27]: np.ma.log([-1,0,1,2])
Out[27]: 
masked_array(data = [-- -- 0.0 0.6931471805599453],
             mask = [ True  True False False],
       fill_value = 1e+20)
In [28]: _.data
Out[28]: array([-1.        ,  0.        ,  0.        ,  0.69314718])

(在 Py3 中运行；numpy 版本 1.13.1)

这种屏蔽行为并非 ma.log 所独有。由它的类决定

In [41]: type(np.ma.log)
Out[41]: numpy.ma.core._MaskedUnaryOperation

在np.ma.core中，它是用fill和domain属性定义的:

log = _MaskedUnaryOperation(umath.log, 1.0,
                        _DomainGreater(0.0))

因此有效域(未屏蔽)为>0:

In [47]: np.ma.log.domain([-1,0,1,2])
Out[47]: array([ True,  True, False, False], dtype=bool)

该域掩码是或-ed与

In [54]: ~np.isfinite(np.log([-1,0,1,2]))
...
Out[54]: array([ True,  True, False, False], dtype=bool)

具有相同的值。

看起来我可以定义一个不添加自己的域屏蔽的自定义日志:

In [58]: mylog = np.ma.core._MaskedUnaryOperation(np.core.umath.log)
In [59]: mylog([-1,0,1,2])
Out[59]: 
masked_array(data = [        nan        -inf  0.          0.69314718],
             mask = False,
       fill_value = 1e+20)

In [63]: np.ma.masked_array([-1,0,1,2],[1,0,0,0])
Out[63]: 
masked_array(data = [-- 0 1 2],
             mask = [ True False False False],
       fill_value = 999999)
In [64]: np.ma.log(np.ma.masked_array([-1,0,1,2],[1,0,0,0]))
Out[64]: 
masked_array(data = [-- -- 0.0 0.6931471805599453],
             mask = [ True  True False False],
       fill_value = 1e+20)
In [65]: mylog(np.ma.masked_array([-1,0,1,2],[1,0,0,0]))
Out[65]: 
masked_array(data = [-- -inf 0.0 0.6931471805599453],
             mask = [ True False False False],
       fill_value = 1e+20)

关于python - 对 numpy 屏蔽数组的操作给出屏蔽的无效值，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/46983061/

python - 对 numpy 屏蔽数组的操作给出屏蔽的无效值

上一篇：python - 用于查找三角形边的用户定义函数中的公式错误

下一篇：python - 如何检索给定值的 pyodbc 行