python - np.isnan 在 dtype "object"的数组上

我正在处理不同数据类型的 numpy 数组。我想知道任何特定数组的哪些元素是 NaN。通常，这就是 np.isnan 的用途。

但是，np.isnan 对数据类型object(或任何字符串数据类型)的数组不友好:

>>> str_arr = np.array(["A", "B", "C"])
>>> np.isnan(str_arr)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
TypeError: Not implemented for this type

>>> obj_arr = np.array([1, 2, "A"], dtype=object)
>>> np.isnan(obj_arr)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
TypeError: ufunc 'isnan' not supported for the input types, and the inputs could not be safely coerced to any supported types according to the casting rule ''safe''

我想从这两个调用中得到的只是np.array([False, False, False])。我不能只将 try 和 except TypeError 放在对 np.isnan 的调用周围，并假设任何生成 TypeError 的数组 不包含 NaN:毕竟，我希望 np.isnan(np.array([1, np.NaN, "A"])) 返回 np .array([False, True, False]).

我目前的解决方案是创建一个类型为 np.float64 的新数组，循环遍历原始数组的元素，尝试将该元素放入新数组(如果失败，将其保留为零)，然后在新数组上调用 np.isnan。然而，这当然是相当慢的。 (至少，对于大型对象数组。)

def isnan(arr):
    if isinstance(arr, np.ndarray) and (arr.dtype == object):
        # Create a new array of dtype float64, fill it with the same values as the input array (where possible), and
        # then call np.isnan on the new array. This way, np.isnan is only called once. (Much faster than calling it on
        # every element in the input array.)
        new_arr = np.zeros((len(arr),), dtype=np.float64)
        for idx in xrange(len(arr)):
            try:
                new_arr[idx] = arr[idx]
            except Exception:
                pass
        return np.isnan(new_arr)
    else:
        try:
            return np.isnan(arr)
        except TypeError:
            return False

这个特定的实现也只适用于一维数组，我想不出一个合适的方法来让 for 循环在任意数量的维度上运行。

是否有更有效的方法来确定 object 类型数组中的哪些元素是 NaN？

编辑: 我正在运行 Python 2.7.10。

请注意，[x is np.nan for x in np.array([np.nan])] 返回 False:np.nan 在内存中并不总是与不同的 np.nan 相同的对象。

我不希望 string "nan" 被认为等同于 np.nan:我希望 isnan(np .array(["nan"], dtype=object)) 返回 np.array([False])。

多维度不是大问题。 (一点点 ravel-and-reshapeing 都解决不了。:p)

任何依赖 is 运算符来测试两个 NaN 是否等价的函数并不总是有效。 (如果您认为他们应该这样做，请问问自己 is 运算符实际上做了什么!)

最佳答案

如果您愿意使用 pandas 库，可以使用 pd.isnull 来解决这个问题。 :

pandas.isnull(obj)

Detect missing values (NaN in numeric arrays, None/NaN in object arrays)

这是一个例子:

$ python
>>> import numpy   
>>> import pandas
>>> array = numpy.asarray(['a', float('nan')], dtype=object)
>>> pandas.isnull(array)
array([False,  True])

关于python - np.isnan 在 dtype "object"的数组上，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/36198118/

python - np.isnan 在 dtype "object"的数组上

上一篇：python - 反规范化单位向量

下一篇：python - 在 Jupyter 笔记本中默认配置第一个单元格