python - 多键多值非确定性 python 字典

标签 python dictionary data-structures recommendation-engine fuzzy-logic

已经有一个multi key dict在 python 中也是一个多值字典。我需要一个 python 字典,它是:

例子:

# probabilistically fetch any one of baloon, toy or car
d['red','blue','green']== "baloon" or "car" or "toy"  

d['red']==d['green'] 的概率高而 d['red']!=d['red'] 的概率低但有可能

单个输出值应该根据键的规则概率确定(模糊) 例如:在上述情况下,规则可能是如果 key 同时具有“红色”和“蓝色”,则在 80% 的时间内返回“气球”,如果只有蓝色,则在 15% 的时间内返回“玩具”,否则在 5% 的时间内返回“汽车”。

setitem 方法应该设计成以下可能:

d["red", "blue"] =[
    ("baloon",haseither('red','green'),0.8),
    ("toy",.....)
    ,....
]

上面使用谓词函数和相应的概率将多个值分配给字典。而不是上面的分配列表,甚至字典作为分配将是更可取的:

d["red", "blue"] ={ 
    "baloon": haseither('red','green',0.8),
    "toy": hasonly("blue",0.15),
    "car": default(0.05)
}

在上面的气球中,如果出现“红色”或绿色,则有 80% 的时间返回 ,如果蓝色存在则有 15% 的时间返回玩具,无条件返回 5% 的时间有汽车。

在python中是否已经存在满足上述要求的数据结构?如果不是那么如何修改 multikeydict 代码以满足 python 中的上述要求?

如果使用字典,则可以有一个配置文件或使用适当的嵌套装饰器来配置上述概率谓词逻辑,而无需硬编码 if\else 语句。

注意:以上是基于规则的自动回复应用程序的有用自动机,因此请告诉我是否有任何类似的基于规则的框架在 python 中可用,即使它不使用字典结构?

最佳答案

模拟多键字典

multi_key_dict不允许__getitem__()一次有多个键...

(例如 d["red", "green"])

可以用 tuple 模拟多键或 set键。如果顺序无关紧要,set似乎是最好的(实际上是可散列的 frozen set ,因此 ["red", "blue"]["blue", "red"] 相同。

模拟多值字典

多个值是使用某些数据类型固有的,可以是any storage element可以方便地编制索引。一个标准dict应该提供。

非确定性

使用由规则和假设定义的概率分布1,使用 this recipe 执行非确定性选择来自 python 文档。

MultiKeyMultiValNonDeterministicDict

多么好的名字。\o/-不错!

此类采用多个键来定义多个值的概率规则集。在项目创建 (__setitem__()) 期间,所有键1 组合的值概率都会预先计算。在项目访问期间 (__getitem__()) 选择预先计算的概率分布,并根据随机加权选择评估结果。

定义

import random
import operator
import bisect
import itertools

# or use itertools.accumulate in python 3
def accumulate(iterable, func=operator.add):
    'Return running totals'
    # accumulate([1,2,3,4,5]) --> 1 3 6 10 15
    # accumulate([1,2,3,4,5], operator.mul) --> 1 2 6 24 120
    it = iter(iterable)
    try:
        total = next(it)
    except StopIteration:
        return
    yield total
    for element in it:
        total = func(total, element)
        yield total

class MultiKeyMultiValNonDeterministicDict(dict):

    def key_combinations(self, keys):
        """get all combinations of keys"""
        return [frozenset(subset) for L in range(0, len(keys)+1) for subset in itertools.combinations(keys, L)]

    def multi_val_rule_prob(self, rules, rule):
        """
        assign probabilities for each value, 
        spreading undefined result probabilities
        uniformly over the leftover results not defined by rule.
        """
        all_results = set([result for result_probs in rules.values() for result in result_probs])
        prob = rules[rule]
        leftover_prob = 1.0 - sum([x for x in prob.values()])
        leftover_results = len(all_results) - len(prob)
        for result in all_results:
            if result not in prob:
                # spread undefined prob uniformly over leftover results
                prob[result] = leftover_prob/leftover_results
        return prob

    def multi_key_rule_prob(self, key, val):
        """
        assign probability distributions for every combination of keys,
        using the default for combinations not defined in rule set
        """ 
        combo_probs = {}
        for combo in self.key_combinations(key):
            if combo in val:
                result_probs = self.multi_val_rule_prob(val, combo).items()
            else:
                result_probs = self.multi_val_rule_prob(val, frozenset([])).items()
            combo_probs[combo] = result_probs
        return combo_probs

    def weighted_random_choice(self, weighted_choices):
        """make choice from weighted distribution"""
        choices, weights = zip(*weighted_choices)
        cumdist = list(accumulate(weights))
        return choices[bisect.bisect(cumdist, random.random() * cumdist[-1])]

    def __setitem__(self, key, val):
        """
        set item in dictionary, 
        assigns values to keys with precomputed probability distributions
        """

        precompute_val_probs = self.multi_key_rule_prob(key, val)        
        # use to show ALL precomputed probabilities for key's rule set
        # print precompute_val_probs        

        dict.__setitem__(self, frozenset(key), precompute_val_probs)

    def __getitem__(self, key):
        """
        get item from dictionary, 
        randomly select value based on rule probability
        """
        key = frozenset([key]) if isinstance(key, str) else frozenset(key)             
        val = None
        weighted_val = None        
        if key in self.keys():
            val = dict.__getitem__(self, key)
            weighted_val = val[key]
        else:
            for k in self.keys():
                if key.issubset(k):
                    val = dict.__getitem__(self, k)
                    weighted_val = val[key]

        # used to show probabality for key
        # print weighted_val

        if weighted_val:
            prob_results = self.weighted_random_choice(weighted_val)
        else:
            prob_results = None
        return prob_results

用法

d = MultiKeyMultiValNonDeterministicDict()

d["red","blue","green"] = {
    # {rule_set} : {result: probability}
    frozenset(["red", "green"]): {"ballon": 0.8},
    frozenset(["blue"]): {"toy": 0.15},
    frozenset([]): {"car": 0.05}
}

测试

检查概率

N = 10000
red_green_test = {'car':0.0, 'toy':0.0, 'ballon':0.0}
red_blue_test = {'car':0.0, 'toy':0.0, 'ballon':0.0}
blue_test = {'car':0.0, 'toy':0.0, 'ballon':0.0}
red_blue_green_test = {'car':0.0, 'toy':0.0, 'ballon':0.0}
default_test = {'car':0.0, 'toy':0.0, 'ballon':0.0}

for _ in xrange(N):
    red_green_test[d["red","green"]] += 1.0
    red_blue_test[d["red","blue"]] += 1.0
    blue_test[d["blue"]] += 1.0
    default_test[d["green"]] += 1.0
    red_blue_green_test[d["red","blue","green"]] += 1.0

print 'red,green test      =', ' '.join('{0}: {1:05.2f}%'.format(key, 100.0*val/N) for key, val in red_green_test.items())
print 'red,blue test       =', ' '.join('{0}: {1:05.2f}%'.format(key, 100.0*val/N) for key, val in red_blue_test.items())
print 'blue test           =', ' '.join('{0}: {1:05.2f}%'.format(key, 100.0*val/N) for key, val in blue_test.items())
print 'default test        =', ' '.join('{0}: {1:05.2f}%'.format(key, 100.0*val/N) for key, val in default_test.items())
print 'red,blue,green test =', ' '.join('{0}: {1:05.2f}%'.format(key, 100.0*val/N) for key, val in red_blue_green_test.items())

red,green test      = car: 09.89% toy: 10.06% ballon: 80.05%
red,blue test       = car: 05.30% toy: 47.71% ballon: 46.99%
blue test           = car: 41.69% toy: 15.02% ballon: 43.29%
default test        = car: 05.03% toy: 47.16% ballon: 47.81%
red,blue,green test = car: 04.85% toy: 49.20% ballon: 45.95%

概率匹配规则!


脚注

  1. 分布假设

    由于规则集未完全定义,因此对概率分布进行了假设,其中大部分是在 multi_val_rule_prob() 中完成的。基本上任何未定义的概率都将均匀分布在其余值上。这是针对所有键组合完成的,并为随机加权选择创建通用键接口(interface)。

    给定示例规则集

    d["red","blue","green"] = {
        # {rule_set} : {result: probability}
        frozenset(["red", "green"]): {"ballon": 0.8},
        frozenset(["blue"]): {"toy": 0.15},
        frozenset([]): {"car": 0.05}
    }
    

    这将创建以下分布

    'red'           = [('car', 0.050), ('toy', 0.475), ('ballon', 0.475)]
    'green'         = [('car', 0.050), ('toy', 0.475), ('ballon', 0.475)]
    'blue'          = [('car', 0.425), ('toy', 0.150), ('ballon', 0.425)]
    'blue,red'      = [('car', 0.050), ('toy', 0.475), ('ballon', 0.475)]
    'green,red'     = [('car', 0.098), ('toy', 0.098), ('ballon', 0.800)]
    'blue,green'    = [('car', 0.050), ('toy', 0.475), ('ballon', 0.475)]
    'blue,green,red'= [('car', 0.050), ('toy', 0.475), ('ballon', 0.475)]
     default        = [('car', 0.050), ('toy', 0.475), ('ballon', 0.475)]
    

    如有不妥,请指教。

关于python - 多键多值非确定性 python 字典,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/29938321/

相关文章:

python - 如何让 heapq 评估特定属性的堆?

algorithm - 哈希表的比例(使用开放寻址)与预期搜索时间之间的关系

python - 处理 Django 的 objects.get 的最佳方法是什么?

Python (3.4) 字典/树扁平化时未调用递归函数

Python数据结构模仿关系数据库

python - Python 中要列出的字典

c++ - 为什么 std::map 实现为红黑树?

python - 如何在不同的子图中绘制 pcolor colorbar - matplotlib

python - 导入 rpy2(子模块)时关于 R_HOME 的错误

python - 在 BaggingRegressor 中使用 xgboost