python - 向量化此非唯一键操作

我有一个名为test 的非唯一原始数据。使用此输入，我想创建一个输出向量以及一组获得非零输出的 rows 和包含其输出的 data。

import numpy as np

rows = np.array([3, 4])
test = np.array([1, 3, 3, 4, 5])
data = np.array([-1, 2])

我的预期输出是一个形状为 test.shape 的向量。

输出中的每个元素:

如果 element 在索引为 i 的 rows 中，output[i] = data[i]
否则，output[i] = 0

换句话说，以下生成我的输出。

output = np.zeros(test.shape)
for i, val in enumerate(rows):
    output[test == val] = data[i]

有什么方法可以对其进行矢量化吗？

最佳答案

这是一个基于 searchsorted 的向量化方法-

# Get sorted index positions
idx = np.searchsorted(rows, test)

# Set out-of-bounds(invalid ones) to some dummy index, say 0
idx[idx==len(rows)] = 0

# Get invalid mask array found out by indexing data array
# with those indices and looking for matches
invalid_mask = rows[idx] != test

# Get data indexed array as output and set invalid places with 0s
out = data[idx]
out[invalid_mask] = 0

最后几行可能有两种选择，如果你挖掘一行 -

out = data[idx] * (rows[idx] == test) # skips using `invalid_mask`

out = np.where(invalid_mask, 0, data[idx])

关于python - 向量化此非唯一键操作，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/49903830/

上一篇：python - Pandas - 如何正确转置列

下一篇： python 3.5 : Moving files to folder based on filenames

相关文章：

python - 使用 numpy 进行网格分布

python pandas - 从面板中识别滚动最大值的行？

python - 启动单个 python 脚本，因为不同的进程因命令行参数而异

Python:使用 pyOpenSSL.crypto 读取 pkcs12 证书

python - 简单链接表所需的 SQLAlchemy group_by() 求和帮助

python - 有人可以解释 xarray.polyfit 系数背后的逻辑吗？

python - pytest 报告太多断言失败

python - 有效改变 scipy.spare.csr_matrix 的维度

python - 无法确定包含转置操作的循环中 numpy 数组的形状

python - 如何使用 Numpy 优化使用连续值的 for 循环？