python - 有什么简单的方法可以将缺失的数字序列转换为其范围？

假设我有一个列表如下: ''' [1,2,3,4,9,10,11,20] ''' 我需要这样的结果: ''' [[4,9],[11,20]] ''' 我定义了一个像这样的函数:

def get_range(lst):
i=0
seqrange=[]
for new in lst:
    a=[]
    start=new
    end=new
    if i==0:
        i=1
        old=new
    else:
        if new - old >1:
            a.append(old)
            a.append(new)
    old=new
    if len(a):
        seqrange.append(a)
return seqrange

还有其他更简单有效的方法吗？我需要在数百万范围内执行此操作。

最佳答案

您可以使用 numpy 数组和 diff function伴随着他们。当您有数百万行时，Numpy 比循环更有效率。

稍稍放一边: Why are numpy arrays so fast?因为它们是数据数组而不是数据指针数组(Python 列表就是这样)，因为它们将一大堆计算卸载到用 C 编写的后端，并且因为它们利用了 SIMD在M多个数据上同时运行单个单个I指令的范例。

现在回到手头的问题:

diff 函数为我们提供了数组中连续元素之间的差异。非常方便，因为我们需要找到这个差异大于已知 threshold 的地方!

import numpy as np

threshold = 1
arr = np.array([1,2,3,4,9,10,11,20])

deltas = np.diff(arr)
# There's a gap wherever the delta is greater than our threshold
gaps = deltas > threshold 
gap_indices = np.argwhere(gaps)

gap_starts = arr[gap_indices]
gap_ends = arr[gap_indices + 1] 

# Finally, stack the two arrays horizontally
all_gaps = np.hstack((gap_starts, gap_ends))
print(all_gaps)
# Output: 
# [[ 4  9]
#  [11 20]]

您可以像访问二维矩阵一样访问 all_gaps:例如，all_gaps[0, 1] 会给您 9。如果您确实需要列表形式的答案，只需像这样转换它:

all_gaps_list = all_gaps.tolist()
print(all_gaps_list)
# Output: [[4, 9], [11, 20]]

比较来自 @happydave's answer 的迭代方法的运行时间使用 numpy 方法:

import random
import timeit

import numpy

def gaps1(arr, threshold):
    deltas = np.diff(arr)
    gaps = deltas > threshold 
    gap_indices = np.argwhere(gaps)
    gap_starts = arr[gap_indices]
    gap_ends = arr[gap_indices + 1] 
    all_gaps = np.hstack((gap_starts, gap_ends))
    return all_gaps

def gaps2(lst, thr):
    seqrange = []
    for i in range(len(lst)-1):
      if lst[i+1] - lst[i] > thr:
        seqrange.append([lst[i], lst[i+1]])
    return seqrange

test_list = [i for i in range(100000)]
for i in range(100):
    test_list.remove(random.randint(0, len(test_list) - 1))

test_arr = np.array(test_list)

# Make sure both give the same answer:
assert np.all(gaps1(test_arr, 1) == gaps2(test_list, 1))

t1 = timeit.timeit('gaps1(test_arr, 1)', setup='from __main__ import gaps1, test_arr', number=100)
t2 = timeit.timeit('gaps2(test_list, 1)', setup='from __main__ import gaps2, test_list', number=100)

print(f"t1 = {t1}s; t2 = {t2}s; Numpy gives ~{t2 // t1}x speedup")

在我的笔记本电脑上，这给出了:

t1 = 0.020834800001466647s; t2 = 1.2446780000027502s; Numpy gives ~59.0x speedup

我的话就是快!

关于python - 有什么简单的方法可以将缺失的数字序列转换为其范围？，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/64247480/

python - 有什么简单的方法可以将缺失的数字序列转换为其范围？

上一篇：design-patterns - 在 F# 中 float 的整数列表

下一篇：python - 如何串联或取消串联 pandas 数据框中的字符串值？