python - 通过允许容差匹配两个包含略有不同的浮点值的列表

我有两个包含浮点值的排序列表。第一个包含我感兴趣的值 (l1)，第二个列表包含我要搜索的值 (l2)。但是，我不是在寻找精确匹配，而是容忍基于函数的差异。由于我经常进行此搜索 (>>100000) 并且列表可能非常大(~5000 和~200000 个元素)，所以我对运行时非常感兴趣。起初，我以为我可以以某种方式使用 numpy.isclose()，但我的容忍度不是固定的，而是取决于感兴趣的值。几个嵌套的 for 循环可以工作，但速度很慢。我相信有一些有效的方法可以做到这一点。

#check if two floats are close enough to match
def matching(mz1, mz2):
    if abs( (1-mz1/mz2) * 1000000) <= 2:
        return True
    return False

#imagine another huge for loop around everything
l1  = [132.0317, 132.8677, 132.8862, 133.5852, 133.7507]
l2  = [132.0317, 132.0318, 132.8678, 132.8861, 132.8862, 133.5851999, 133.7500]

d = {i:[] for i in l1}
for i in l1:
    for j in l2:
        if matching(i, j):
            d[i].append(j)

fyi:作为匹配函数的替代方法，我还可以先创建一个字典，将感兴趣的值从 l1 映射到窗口 (min ,max)我会允许的。例如{132.0317:(132.0314359366, 132.0319640634), ...}，但我认为检查 l2 中的每个值是否位于该字典的一个窗口内更慢...

这将是如何生成包含 l1 中每个值的最小/最大值的字典:

def calcMinMaxMZ(mz, delta_ppm=2):
    minmz = mz- (mz* +delta_ppm)/1000000
    maxmz = mz- (mz* -delta_ppm)/1000000
    return minmz, maxmz

minmax_d = {mz:calcMinMaxMZ(mz, delta_ppm=2) for mz in l1}

结果可能是这样的字典: d = {132.0317: [132.0317, 132.0318], 132.8677: [132.8678], 132.8862: [132.8862, 132.8861], 133.5852: [133.5851999], 133.7507: []} 但实际上我做的更多，当有比赛时。

感谢任何帮助!

最佳答案

我使用 itertools 重新实现了 for 循环。为了使其正常工作，必须对输入进行排序。对于基准测试，我从 <130.0, 135.0> 为 l1 生成了 1000 个项目，从 <130.0, 135.0> 为 l2 生成了 100_000 个项目:

from timeit import timeit
from itertools import tee
from random import uniform

#check if two floats are close enough to match
def matching(mz1, mz2):
    if abs( (1-mz1/mz2) * 1000000) <= 2:
        return True
    return False

#imagine another huge for loop around everything
l1 = sorted([uniform(130.00, 135.00) for _ in range(1000)])
l2 = sorted([uniform(130.00, 135.00) for _ in range(100_000)])

def method1():
    d = {i:[] for i in l1}
    for i in l1:
        for j in l2:
            if matching(i, j):
                d[i].append(j)
    return d

def method2():
    iter_2, last_match = tee(iter(l2))
    d = {}
    for i in l1:
        d.setdefault(i, [])
        found = False
        while True:
            j = next(iter_2, None)
            if j is None:
                break
            if matching(i, j):
                d[i].append(j)
                if not found:
                    iter_2, last_match = tee(iter_2)
                    found = True
            else:
                if found:
                    break
        iter_2, last_match = tee(last_match)
    return d

print(timeit(lambda: method1(), number=1))
print(timeit(lambda: method2(), number=1))

打印:

16.900722101010615
0.030588202003855258

关于python - 通过允许容差匹配两个包含略有不同的浮点值的列表，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/57361453/

python - 通过允许容差匹配两个包含略有不同的浮点值的列表

上一篇：python - 如何在 Python 中向量化增量

下一篇：python - 我知道一些 CRC32 哈希的长度和限制字符集。这是否更容易扭转它们？