python - 通过对 numpy 数组中的每个第 n 个元素进行二次采样来排序？

我正在尝试对每个第 n 个元素进行采样以对数组进行排序。我当前的解决方案有效，但感觉应该有一个不涉及串联的解决方案。

我当前的实现如下。

arr = np.arange(10)
print(arr)
[0 1 2 3 4 5 6 7 8 9]

# sample every 5th element
res = np.empty(shape=0)
for i in range(5):
    res = np.concatenate([res, arr[i::5]])
    
print(res)
[0. 5. 1. 6. 2. 7. 3. 8. 4. 9.]

寻找任何技巧来使其更快/更Pythonic。我的用例是包含约 10,000 个值的数组。

最佳答案

Reshape将向量放入每行有 N 个元素的二维数组中，然后 flatten按列排列:

import numpy as np

# Pick "subsample stride"
N = 5

# Create a vector with length divisible by N.
arr = np.arange(2 * N)
print(arr)

# Reshape arr into a 2D array with N elements per row and however many
# columns required. 
# Flatten it with "F" ordering for "Fortran style" (column-major).
output = arr.reshape(-1, N).flatten("F")
print(output)

输出

[0 1 2 3 4 5 6 7 8 9]
[0 5 1 6 2 7 3 8 4 9]

性能比较

Python 3.10.6 (main, Nov 14 2022, 16:10:14) [GCC 11.3.0]
Type 'copyright', 'credits' or 'license' for more information
IPython 7.31.1 -- An enhanced Interactive Python. Type '?' for help.

In [1]: import numpy as np

In [2]: def sol0(arr):
   ...:     """OP's original solution."""
   ...:     res = np.empty(shape=0)
   ...:     for i in range(5):
   ...:         res = np.concatenate([res, arr[i::5]])
   ...:     return res
   ...: 

In [3]: def sol1(arr):  
   ...:     """This answer's solution."""
   ...:     return arr.reshape(-1, 5).flatten("F")
   ...: 

In [4]: def sol2(arr):
   ...:     """@seralouk's solution, with shape error patch"""
   ...:     res = np.empty((5, arr.size//5), order='F')
   ...:     for i in range(5):
   ...:         res[i::5] = arr[i::5]
   ...:     return res.reshape(-1)

In [5]: arr = np.arange(10_000)

In [6]: %timeit sol0(arr)
26.6 µs ± 724 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)

In [7]: %timeit sol1(arr)
7.81 µs ± 34 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)

In [8]: %timeit sol2(arr)
36.3 µs ± 841 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)

关于python - 通过对 numpy 数组中的每个第 n 个元素进行二次采样来排序？，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/75423163/

python - 通过对 numpy 数组中的每个第 n 个元素进行二次采样来排序？

性能比较

上一篇：C 可变参数函数 : need to keep multiple va_list but old ones are overwritten by latest one

下一篇：r - 显示轴刻度ggplot时隐藏 "20"年的 "20XX"