python - 如何在 Python 中并行检查多个列表中是否存在某个项目？

作为我项目的一部分，对于字典 d 中的每个单词(如下面的示例代码片段所示)，我需要检查其在不同列表 f1, f2, f3 中是否存在。我在这里只展示了 3 个列表。根据发生的情况，我需要计算两个输出值(规则输入和权重)。我在这里面临的问题是，该单词可以出现在任意数量的列表中，比如说 dict d 中的 word1 出现在列表 f1, f2, f3 (如下所示)中，而 dict d 中的 word2 出现在 f1 中f2和word3只出现在一个列表f3中。我有数百个这样的个人列表。我需要一种高效且直接的方法来计算字典 d 中每个单词的输出值(规则输入和权重)，基于它们在这些列表中的不同出现次数，这样我就不必检查每个出现的组合并编写一个单独的条件，这会让事情变得复杂和丑陋。

P.S.:列表的大小不同。在下面的示例中，f1、f2 和 f3 的大小不同。

我的代码:

import itertools

d = {'Rosa': 0.023, 'code': 0.356, 'Syntel': 0.144, 'Robotics': 0.245, 'Web': .134, 'sanskrit': 0.23, 'Tamil': 0.23}
f1 = [['Syntel', 0.2, 4, 0.46, 7, 0.9], ['code', 0.45, 9, 0.43, 2, 0.23], ['Robotics', .43, 3, .1, 3, .73]]
f2 = [['Web', 0.5, 5, 0.06, 6, 0.9], ['code', 0.05, 1, 0.28, 2, 0.73]]
f3 = [['Web', 0.5, 5, 0.06, 6, 0.9], ['sanskrit', 0.05, 1, 0.28, 2, 0.73], ['Tamil', 0.23, 4, .13, 5, .23], ['code', 0.32, 4, 0.12, 4, .24]]

# specific case where I am checking if a word of the dictionary occurs in all of the lists f1, f2 and f3
# I have to write chunk of code for every possible combo of occurrence which I think is a bad approach
# I am brain stuck ! Help please !!
for word, score in d.iteritems():
    for x in f1:
        if word == x[0]:
            for y in f2:
                if word == y[0]:
                    for z in f3:
                        if word == z[0]:
                            A = x[2] * x[3]
                            B = x[4] * x[5]
                            C = y[2] * y[3] + 1
                            D = y[4] * y[5] + 1
                            E = z[2] * z[3] + 1
                            F = z[4] * z[5] + 1
                            mfs = [[A, B], [C, D], [E, F]]
                            weights = sum([x[3], x[5], y[3], y[5], z[3], z[5]])
                            rule_inputs = list(itertools.product(*mfs))
                            len_comb = len(rule_inputs)
                            # 6 --> need code to find this automatically
                            weight_factor = (len(mfs) * len_comb) / 6
                            weights *= weight_factor
                            rule_inputs = sum([sum(r) for r in rule_inputs])
                            print word, rule_inputs, weights

最佳答案

正如 Joel Cornett 所说，您可能应该首先使用 dict 而不是 list。

但是，如果您出于某种原因需要 list...那么，如果您要多次搜索 list，您可能需要构建一个 dict 进行搜索:

d1 = {elem[0]: elem for elem in f1}

然后，而不是这个:

for z in f3:
    if word == z[0]:

...你可以这样做:

z = d3.get(word)
if z is not None:

您可能还想关注 EAFTP 并尝试整个过程。你的整个循环看起来像这样:

for word, score in d.iteritems():
    try:
        x, y, z = d1[word], d2[word], d3[word]
    except KeyError:
        continue
    A = x[2] * x[3]
    # etc.

这是假设您特别需要三个列表，而不是任意数量。如果您需要能够使用任意数量的列表，您可以这样做:

list_of_dicts = [{elem[0]: elem for elem in lst} for lst in list_of_lists]
for word, score in d.iteritems():
    try:
        values = [d[word] for d in list_of_dicts]
    except KeyError:
        continue
    A = values[0][2] * values[0][3]
    # etc.

有几种替代方案，但这可能是您想要的。

您可以对每个列表进行排序并使用bisect而不是迭代线性搜索，或者使用类似SortedCollection的东西为您总结一下，或者 blist.sortedlist对于类似的类型。这使得搜索 O(log N) 而不是 O(N)，并且使代码更简单。但是 dict 使搜索的时间复杂度为 O(1) 而不是 O(N)，并且使代码与使用排序列表一样简单，因此，除非您正在处理不可散列的键(而你不是)，为什么要麻烦呢？

您还可以通过编写 find_in_list 函数来封装 for/if 函数，这为您提供了与 相同的简单性dict 或 sortedlist，但没有性能提升。如果键既不可散列也不可排序，或者如果您有大量的小列表(小到线性搜索实际上比字典或树查找更快——可能大小约为 2-3？)，这可能很有用。但除此之外，您只是在做额外的工作(编写 find_in_list 包装器)来减慢自己的速度，所以再说一次，为什么要麻烦呢？

关于python - 如何在 Python 中并行检查多个列表中是否存在某个项目？，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/15957980/

python - 如何在 Python 中并行检查多个列表中是否存在某个项目？

上一篇：python - 如何从 Perl 哈希在 Python 中创建等效的字典？

下一篇：python - 冒泡排序和插入排序如何计时？