python - 开发反向查找字典的有效方法？

假设我有一本包含以下内容的字典:

old_dict = {'a':[0,1,2], 'b':[1,2,3]}

我想获取一个新字典，其中键是旧字典中的值，新值是旧字典中的键，即:

new_dict = {0:['a'], 1:['a','b'], 2:['a','b'], 3:['b']}

为了执行此任务，我当前使用以下示例代码:

# get all the keys for the new dictionary
new_keys = np.unique(np.hstack([old_dict[key] for key in old_dict]))

# initialize new dictionary
new_dict = {key: [] for key in new_keys}
# step through every new key
for new_key in new_keys:
    # step through every old key and check if the new key the current list of values
    for old_key in old_dict:
        if new_key in old_dict[old_key]:
            new_dict[new_key].append(old_key)

在此示例中，我显示了 2 个旧 key 和 4 个新 key ，但对于我的问题，我有 ~10,000 个旧 key 和 ~100,000 个新 key 。有没有更智能的方法来执行我的任务，也许使用一些基于树的算法？我使用字典是因为它们更容易让我形象化问题，但如果本练习有更好的数据类型，字典可能是必要的。

与此同时，我正在研究字典反向查找的文档，并尝试使用 geopandas 中的 sindex 来操作它。

最佳答案

你可以尝试:

old_dict = {'a':[0,1,2], 'b':[1,2,3]}

new_dict = {}
for k, v in old_dict.items():
    for i in v:
        new_dict.setdefault(i, []).append(k)

print(new_dict)

打印:

{0: ['a'], 1: ['a', 'b'], 2: ['a', 'b'], 3: ['b']}

基准:

import numpy as np
from timeit import timeit

old_dict = {'a':[0,1,2], 'b':[1,2,3]}


def f1():
    new_dict = {}
    for k, v in old_dict.items():
        for i in v:
            new_dict.setdefault(i, []).append(k)
    return new_dict

def f2():
    # get all the keys for the new dictionary
    new_keys = np.unique(np.hstack([old_dict[key] for key in old_dict]))

    # initialize new dictionary
    new_dict = {key: [] for key in new_keys}
    # step through every new key
    for new_key in new_keys:
        # step through every old key and check if the new key the current list of values
        for old_key in old_dict:
            if new_key in old_dict[old_key]:
                new_dict[new_key].append(old_key)
    return new_dict


t1 = timeit('f1()', number=1000, globals=globals())
t2 = timeit('f2()', number=1000, globals=globals())

print(t1)
print(t2)

打印:

0.0005186359921935946
0.009738252992974594

使用 old_dict 初始化(dict 现在有 10648 项):

from itertools import product
from random import randint

k = 'abcdefghijkloprstuvwyz'
old_dict = {''.join(c): list(range(randint(1, 3), randint(4, 10))) for c in product(k, k, k)}
print(len(old_dict))

打印:

10648

3.126827526008128
19.222182962010265

关于python - 开发反向查找字典的有效方法？，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/75816835/

python - 开发反向查找字典的有效方法？

上一篇：android - Kotlin 协程、改造、Android

下一篇：c# - 并行处理 'stream' 元素同时保持输出有序的好方法