根据权重分配的算法,在项目总数未知的情况下,保证良好的分配?

标签 algorithm distribution probability

我想将代币分配到 3 个插槽中。

每个槽位都有一定的权重:也许 50% 的代币应该放入第一个槽位,30% 应放入第二个槽位,20% 应放入第三个槽位。

我不知道代币的总数 - 它们不断增加。中午我可能会得到 1000 个代币来分发,下午 1 点我会得到另外 300 个代币。等等,难以预料。在任何时间点,我迄今为止收到的代币都应该根据权重尽可能分配。

一种解决方案是按概率分配。我为每个 token 掷一个 100 面骰子。如果结果是 1-50,则代币进入槽 1。结果 51-80 意味着槽 2,结果 81-100 意味着槽 3。

但这意味着每个 token 最终都出现在插槽 3 中并非不可能(只是不可能)。

我想保证,当我总共收到 100 个代币时,其中 50 个将位于插槽 1 中。当我收到 1000 个代币时,正好 500 个代币位于插槽 1 中。

什么是好的算法?

最佳答案

根据理想分布计算每个槽的误差。始终将 token 插入错误最多的插槽中。如果两个或多个插槽并列,则插入其中的随机一个。

误差为预期代币数量(添加代币*比例)与实际代币数量之间的差异。

这样您将始终最大限度地减少错误,并且如果代币能够准确分配,则不会出现错误。

演示代码(如果存在相同数量的错误,则插入到第一个槽中,而不是随机分布):

import random

tokens_in_slots = [0, 0, 0]
slot_distributions = [0.5, 0.3, 0.2]

def add_token():
    num_tokens = sum(tokens_in_slots)
    if not num_tokens:
        #first token can go anywhere
        tokens_in_slots[random.randint(0,2)] += 1
        return
    expected_tokens = [num_tokens*distr for distr in slot_distributions]
    errors = [expected - actual
              for expected, actual in zip(expected_tokens, tokens_in_slots)]
    most_error = max(enumerate(errors), key=lambda (i,e): e)
    tokens_in_slots[most_error[0]] += 1

def add_and_print(n):
    for i in xrange(n):
        add_token()
        print sum(tokens_in_slots), tokens_in_slots

结果:

>>> add_and_print(100)
1 [0, 0, 1]
2 [1, 0, 1]
3 [1, 1, 1]
4 [2, 1, 1]
5 [2, 2, 1]
6 [3, 2, 1]
7 [3, 2, 2]
8 [4, 2, 2]
9 [4, 3, 2]
10 [5, 3, 2]
11 [6, 3, 2]
12 [6, 4, 2]
13 [6, 4, 3]
14 [7, 4, 3]
15 [7, 5, 3]
16 [8, 5, 3]
17 [8, 5, 4]
18 [9, 5, 4]
19 [9, 6, 4]
20 [10, 6, 4]
21 [11, 6, 4]
22 [11, 7, 4]
23 [11, 7, 5]
24 [12, 7, 5]
25 [12, 8, 5]
26 [13, 8, 5]
27 [13, 8, 6]
28 [14, 8, 6]
29 [14, 9, 6]
30 [15, 9, 6]
31 [16, 9, 6]
32 [16, 10, 6]
33 [16, 10, 7]
34 [17, 10, 7]
35 [17, 11, 7]
36 [18, 11, 7]
37 [18, 11, 8]
38 [19, 11, 8]
39 [19, 12, 8]
40 [20, 12, 8]
41 [21, 12, 8]
42 [21, 13, 8]
43 [21, 13, 9]
44 [22, 13, 9]
45 [22, 14, 9]
46 [23, 14, 9]
47 [23, 14, 10]
48 [24, 14, 10]
49 [24, 15, 10]
50 [25, 15, 10]
51 [26, 15, 10]
52 [26, 16, 10]
53 [26, 16, 11]
54 [27, 16, 11]
55 [27, 17, 11]
56 [28, 17, 11]
57 [28, 17, 12]
58 [29, 17, 12]
59 [29, 18, 12]
60 [30, 18, 12]
61 [31, 18, 12]
62 [31, 19, 12]
63 [31, 19, 13]
64 [32, 19, 13]
65 [32, 20, 13]
66 [33, 20, 13]
67 [33, 20, 14]
68 [34, 20, 14]
69 [34, 21, 14]
70 [35, 21, 14]
71 [36, 21, 14]
72 [36, 22, 14]
73 [36, 22, 15]
74 [37, 22, 15]
75 [37, 23, 15]
76 [38, 23, 15]
77 [38, 23, 16]
78 [39, 23, 16]
79 [39, 24, 16]
80 [40, 24, 16]
81 [41, 24, 16]
82 [41, 25, 16]
83 [41, 25, 17]
84 [42, 25, 17]
85 [42, 26, 17]
86 [43, 26, 17]
87 [43, 26, 18]
88 [44, 26, 18]
89 [44, 27, 18]
90 [45, 27, 18]
91 [46, 27, 18]
92 [46, 28, 18]
93 [46, 28, 19]
94 [47, 28, 19]
95 [47, 29, 19]
96 [48, 29, 19]
97 [48, 29, 20]
98 [49, 29, 20]
99 [49, 30, 20]
100 [50, 30, 20]

结果

tokens_in_slots = [0, 0, 0, 0]
slot_distributions = [0.8, 0.1, 0.05, 0.05]

:

>>> add_and_print(100)
1 [0, 0, 1, 0]
2 [1, 0, 1, 0]
3 [2, 0, 1, 0]
4 [3, 0, 1, 0]
5 [3, 1, 1, 0]
6 [4, 1, 1, 0]
7 [5, 1, 1, 0]
8 [6, 1, 1, 0]
9 [7, 1, 1, 0]
10 [7, 1, 1, 1]
11 [8, 1, 1, 1]
12 [9, 1, 1, 1]
13 [10, 1, 1, 1]
14 [11, 1, 1, 1]
15 [11, 2, 1, 1]
16 [12, 2, 1, 1]
17 [13, 2, 1, 1]
18 [14, 2, 1, 1]
19 [15, 2, 1, 1]
20 [16, 2, 1, 1]
21 [17, 2, 1, 1]
22 [17, 3, 1, 1]
23 [18, 3, 1, 1]
24 [19, 3, 1, 1]
25 [20, 3, 1, 1]
26 [20, 3, 2, 1]
27 [21, 3, 2, 1]
28 [22, 3, 2, 1]
29 [23, 3, 2, 1]
30 [23, 3, 2, 2]
31 [24, 3, 2, 2]
32 [25, 3, 2, 2]
33 [26, 3, 2, 2]
34 [27, 3, 2, 2]
35 [27, 4, 2, 2]
36 [28, 4, 2, 2]
37 [29, 4, 2, 2]
38 [30, 4, 2, 2]
39 [31, 4, 2, 2]
40 [32, 4, 2, 2]
41 [33, 4, 2, 2]
42 [33, 5, 2, 2]
43 [34, 5, 2, 2]
44 [35, 5, 2, 2]
45 [36, 5, 2, 2]
46 [36, 5, 3, 2]
47 [37, 5, 3, 2]
48 [38, 5, 3, 2]
49 [39, 5, 3, 2]
50 [39, 5, 3, 3]
51 [40, 5, 3, 3]
52 [41, 5, 3, 3]
53 [42, 5, 3, 3]
54 [43, 5, 3, 3]
55 [43, 6, 3, 3]
56 [44, 6, 3, 3]
57 [45, 6, 3, 3]
58 [46, 6, 3, 3]
59 [47, 6, 3, 3]
60 [48, 6, 3, 3]
61 [49, 6, 3, 3]
62 [49, 7, 3, 3]
63 [50, 7, 3, 3]
64 [51, 7, 3, 3]
65 [52, 7, 3, 3]
66 [52, 7, 4, 3]
67 [53, 7, 4, 3]
68 [54, 7, 4, 3]
69 [55, 7, 4, 3]
70 [55, 7, 4, 4]
71 [56, 7, 4, 4]
72 [57, 7, 4, 4]
73 [58, 7, 4, 4]
74 [59, 7, 4, 4]
75 [59, 8, 4, 4]
76 [60, 8, 4, 4]
77 [61, 8, 4, 4]
78 [62, 8, 4, 4]
79 [63, 8, 4, 4]
80 [64, 8, 4, 4]
81 [65, 8, 4, 4]
82 [65, 9, 4, 4]
83 [66, 9, 4, 4]
84 [67, 9, 4, 4]
85 [68, 9, 4, 4]
86 [68, 9, 5, 4]
87 [69, 9, 5, 4]
88 [70, 9, 5, 4]
89 [71, 9, 5, 4]
90 [71, 9, 5, 5]
91 [72, 9, 5, 5]
92 [73, 9, 5, 5]
93 [74, 9, 5, 5]
94 [75, 9, 5, 5]
95 [75, 10, 5, 5]
96 [76, 10, 5, 5]
97 [77, 10, 5, 5]
98 [78, 10, 5, 5]
99 [79, 10, 5, 5]
100 [80, 10, 5, 5]

关于根据权重分配的算法,在项目总数未知的情况下,保证良好的分配?,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/11832196/

相关文章:

java - 如何配置 Eclipse 来构建需要外部库的 Java 应用程序?

python - 使用Python对图像使用最大似然算法进行分割

r - 尝试将 f 分布拟合到向量

iphone - 通过网站分发 iPad 应用程序(.ipa 格式文件)

python - 为什么我的 Monty Hall 解决方案不起作用?

R : function to generate a mixture distribution

algorithm - 查找二叉树是否是二叉搜索树

ruby - 创建一系列数组元素的所有可能组合的字符串数组

algorithm - 关于数据压缩算法参数的问题

php 百分比机会