python - 忽略 nans 沿轴取 np.percentile 的最佳方法是什么？

对于包含 NaN 值的数据，是否有一种相当快速的方法来np.percentile(ndarr, axis=0)？

对于np.median，有相应的bottleneck.nanmedian ( https://pypi.python.org/pypi/Bottleneck ) 是相当不错的。

我想出的最好的百分位数是:

   from bottleneck import nanrankdata, nanmax, nanargmin
   def nanpercentile(x, q, axis):
       ranks = nanrankdata(x, axis=axis)
       peak = nanmax(ranks, axis=axis)
       pct = ranks/peak / 100. # to make a percentile
       wh = nanargmin(abs(pct-q),axis=axis)
       return x[wh]

这行不通；真正需要的是沿 axis 获取第 n 个元素的方法，但我还没有找到 numpy 切片技巧来做到这一点。

“相当快”意味着比遍历索引更好，例如:

q = 40
x = np.array([[[1,2,3],[6,np.nan,4]],[[0.5,2,1],[9,3,np.nan]]])
out = np.empty(x.shape[:-1])
for i in range(x.shape[0]):
   for j in range(x.shape[1]):
      d = x[i,j,:]
      out[i,j] = np.percentile(d[np.isfinite(d)], q)

print out

#array([[ 1.8,  4.8],
#       [ 0.9,  5.4]])

这有效但可能非常慢。

np.ma 似乎没有按预期工作；它将 nan 值视为 inf:

xm = np.ma.masked_where(np.isnan(x),x)
print np.percentile(xm,40,axis=2)

# array([[ 1.8,  5.6],
#        [ 0.9,  7.8]])

最佳答案

np.nanpercentile 包含在 numpy 1.9.0 中

http://docs.scipy.org/doc/numpy/reference/generated/numpy.nanpercentile.html

关于python - 忽略 nans 沿轴取 np.percentile 的最佳方法是什么？，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/23123034/

上一篇：python - setup.py 包和 unicode_literals

下一篇：python - 根据 Python 中另外两个数组的值创建数组的子集

Python使用正则表达式转换字符串

python - 具有全局变量的 multiprocessing.Pool

python - 在 Fipy 中求解多个偏微分方程

c++ - 通过 swig 将二维数组从 numpy 传递到 c++ 不能使用 float**

python - 使用 GEKKO 模拟具有巨大阵列的状态空间方程

python - 对 matplotlib 颜色的颜色代码感到好奇

python - 在旋转排序数组中查找最小值的回收数组解决方案

python - 需要找到黑点

python - numpy 导入时出现 KeyError 'PATH'