问题
我想确定我何时遇到了一个真值,并为数组的其余部分维护该值……为特定的 bin。从 Numpy 的角度来看,它就像是 numpy.logical_or.accumulate
和 numpy.logical_or.at
的组合。
例子
考虑 a
中的真值、b
中的 bin 和 c
中的预期输出。
我将 0
用于 False
并将 1
用于 True
然后转换为 bool
为了对齐数组值。
a = np.array([0, 0, 0, 0, 1, 0, 0, 0, 1, 0, 0, 0, 1, 0, 0, 0, 0]).astype(bool)
b = np.array([0, 0, 0, 1, 1, 1, 1, 0, 0, 0, 2, 3, 3, 0, 1, 2, 3])
# zeros ↕ ↕ ↕ ↕ ↕ ↕ ↕
# ones ↕ ↕ ↕ ↕ ↕
# twos ↕ ↕
# threes ↕ ↕ ↕
c = np.array([0, 0, 0, 0, 1, 1, 1, 0, 1, 1, 0, 0, 1, 1, 1, 0, 1]).astype(bool)
# ╰─────╯ ↑ ↑ ↑ ↑
# zero bin no True yet │ │ │ two never had a True
# one bin first True │ three bin first True
# zero bin first True
我尝试过的
我可以遍历每个值并跟踪关联的 bin 是否已经看到 True
值。
tracker = np.zeros(4, bool)
result = np.zeros(len(b), bool)
for i, (truth, bin_) in enumerate(zip(a, b)):
tracker[bin_] |= truth
result[i] = tracker[bin_]
result * 1
array([0, 0, 0, 0, 1, 1, 1, 0, 1, 1, 0, 0, 1, 1, 1, 0, 1])
但我希望有一个 O(n) 时间的 Numpy 解决方案。我可以选择使用像 Numba 这样的 JIT 包装器,但我宁愿只使用 Numpy。
最佳答案
O(n) 解
def cumulative_linear_seen(seen, bins):
"""
Tracks whether or not a value has been observed as
True in a 1D array, and marks all future values as
True for these each individual value.
Parameters
----------
seen: ndarray
One-hot array marking an occurence of a value
bins: ndarray
Array of bins to which occurences belong
Returns
-------
One-hot array indicating if the corresponding bin has
been observed at a point in time
"""
# zero indexing won't work with logical and, need to 1-index
one_up = bins + 1
# Next step is finding where each unique value is seen
occ = np.flatnonzero(a)
v_obs = one_up[a]
# We can fill another mapping array with these occurences.
# then map by corresponding index
i_obs = np.full(one_up.max() + 1, seen.shape[0] + 1)
i_obs[v_obs] = occ
# Finally, we create the map and compare to an array of
# indices from the original seen array
seen_idx = i_obs[one_up]
return (seen_idx <= np.arange(seen_idx.shape[0])).astype(int)
array([0, 0, 0, 0, 1, 1, 1, 0, 1, 1, 0, 0, 1, 1, 1, 0, 1])
PiR 的贡献
基于以上见解
r = np.arange(len(b))
one_hot = np.eye(b.max() + 1, dtype=bool)[b]
np.logical_or.accumulate(one_hot & a[:, None], axis=0)[r, b] * 1
array([0, 0, 0, 0, 1, 1, 1, 0, 1, 1, 0, 0, 1, 1, 1, 0, 1])
较早的尝试
为了让事情开始,这里有一个解决方案,虽然是矢量化的,但不是 O(n)。我相信存在与此类似的 O(n) 解决方案,我会处理复杂性:-)
尝试 1
q = b + 1
u = sparse.csr_matrix(
(a, q, np.arange(a.shape[0] + 1)), (a.shape[0], q.max()+1)
)
m = np.maximum.accumulate(u.A) * np.arange(u.shape[1])
r = np.where(m[:, 1:] == 0, np.nan, m[:, 1:])
(r == q[:, None]).any(1).view(np.int8)
array([0, 0, 0, 0, 1, 1, 1, 0, 1, 1, 0, 0, 1, 1, 1, 0, 1], dtype=int8)
尝试 2
q = b + 1
m = np.logical_and(a, q)
r = np.flatnonzero(u)
t = q[m]
f = np.zeros((a.shape[0], q.max()))
f[r, t-1] = 1
v = np.maximum.accumulate(f) * np.arange(1, q.max()+1)
(v == q[:, None]).any(1).view(np.int8)
array([0, 0, 0, 0, 1, 1, 1, 0, 1, 1, 0, 0, 1, 1, 1, 0, 1], dtype=int8)
关于python - 箱内累积逻辑或,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/55955372/