我正在尝试对每个第 n 个元素进行采样以对数组进行排序。我当前的解决方案有效,但感觉应该有一个不涉及串联的解决方案。
我当前的实现如下。
arr = np.arange(10)
print(arr)
[0 1 2 3 4 5 6 7 8 9]
# sample every 5th element
res = np.empty(shape=0)
for i in range(5):
res = np.concatenate([res, arr[i::5]])
print(res)
[0. 5. 1. 6. 2. 7. 3. 8. 4. 9.]
寻找任何技巧来使其更快/更Pythonic。我的用例是包含约 10,000 个值的数组。
最佳答案
Reshape将向量放入每行有 N
个元素的二维数组中,然后 flatten按列排列:
import numpy as np
# Pick "subsample stride"
N = 5
# Create a vector with length divisible by N.
arr = np.arange(2 * N)
print(arr)
# Reshape arr into a 2D array with N elements per row and however many
# columns required.
# Flatten it with "F" ordering for "Fortran style" (column-major).
output = arr.reshape(-1, N).flatten("F")
print(output)
输出
[0 1 2 3 4 5 6 7 8 9]
[0 5 1 6 2 7 3 8 4 9]
性能比较
Python 3.10.6 (main, Nov 14 2022, 16:10:14) [GCC 11.3.0]
Type 'copyright', 'credits' or 'license' for more information
IPython 7.31.1 -- An enhanced Interactive Python. Type '?' for help.
In [1]: import numpy as np
In [2]: def sol0(arr):
...: """OP's original solution."""
...: res = np.empty(shape=0)
...: for i in range(5):
...: res = np.concatenate([res, arr[i::5]])
...: return res
...:
In [3]: def sol1(arr):
...: """This answer's solution."""
...: return arr.reshape(-1, 5).flatten("F")
...:
In [4]: def sol2(arr):
...: """@seralouk's solution, with shape error patch"""
...: res = np.empty((5, arr.size//5), order='F')
...: for i in range(5):
...: res[i::5] = arr[i::5]
...: return res.reshape(-1)
In [5]: arr = np.arange(10_000)
In [6]: %timeit sol0(arr)
26.6 µs ± 724 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)
In [7]: %timeit sol1(arr)
7.81 µs ± 34 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)
In [8]: %timeit sol2(arr)
36.3 µs ± 841 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)
关于python - 通过对 numpy 数组中的每个第 n 个元素进行二次采样来排序?,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/75423163/