python-3.x - 在 Python 中使用 Sets Insert Delete Get Random O(1) 顺序算法的工作

在处理一个问题时，我发现集合有些奇怪。有人可以解释其背后的逻辑。每当我添加要设置的元素时，它都会以正确的顺序插入。鉴于该集合是无序数据结构，这怎么可能？

以下是我观察到的一个例子:

>>> a = set([1,3,5])
>>> a
{1, 3, 5}
>>> a.pop()
1
>>> a
{3, 5}
>>> a.add(4)
>>> a
{3, 4, 5}
>>> a.add(6)
>>> a
{3, 4, 5, 6}
>>> a.add(2)
>>> a
{2, 3, 4, 5, 6}
>>>

我是如何偶然发现这个观察结果的:

我试图解决这个问题，其中我必须设计一个数据结构来在 O(1) 时间内执行插入、删除和 getRandom。详情可以在https://leetcode.com/problems/insert-delete-getrandom-o1-duplicates-allowed/description/找到

我的基本想法是使用一个 HashMap of number -> List 存储插入列表中的所有键的索引列表。与它一起，我维护了一个值列表(V)。 List 和 HashMap 将允许不断插入。 HashMap 将允许不断删除值。如果我们将要删除的元素与列表的最后一个元素交换，然后删除最后一个元素，我们就可以实现从列表中不断删除值。

基本用例:

插入值 1。1 附加到 List(V)。该索引存储在HashMap中，key为1。
getRandom 是从 List(V) 中随机选取一个元素
要删除，值为1，从HashMap 中弹出最后一个索引为1，然后将此索引与新元素交换。然后在 HashMap 中更新交换元素的最后一个索引，并删除 List(V) 中的最后一个元素。

我面临的问题是我需要在 HashMap 中的正确位置插入交换元素的新索引，以使该算法正常工作。

但有趣的是，当我在 HashMap 中使用集合而不是列表时，我不需要处理它。该集合以某种方式将元素插入正确的位置。我知道那个集合应该是一个无序的数据集，那么为什么它会起作用。有人可以解释集合的这种行为吗？

以下是使用列表的代码，我必须使用二进制搜索将交换的索引插入适当的位置。这里绝对删除不是O(1)

    import random
    import bisect
    class RandomizedCollection:

        def __init__(self):
            """
            Initialize your data structure here.
            """
            self.myMap = {}
            self.stack = []

        def insert(self, val):
            """
            Inserts a value to the collection. Returns true if the collection did not already contain the specified element.
            :type val: int
            :rtype: bool
            """
            #print("Inserting",val)
            #print(self.myMap,self.stack)
            tmp = self.myMap.get(val,[])
            if len(tmp) == 0:
                self.stack.append(val)
                tmp.append(len(self.stack)-1)
                self.myMap[val] = tmp
                return True
            else:
                self.stack.append(val)
                tmp.append(len(self.stack)-1)
                self.myMap[val] = tmp
                return False

        def remove(self, val):
            """
            Removes a value from the collection. Returns true if the collection contained the specified element.
            :type val: int
            :rtype: bool
            """
            #print("Removing",val)
            #print(self.myMap,self.stack)
            tmp = self.myMap.get(val,[])
            if len(tmp) > 0:
                if self.stack[-1] != val:
                    idx_to_remove = tmp.pop()
                    last_val = self.stack[-1]
                    #print(idx_to_remove, last_val)

                    self.myMap[last_val].pop() ## removes the last index
                    insert_pos = bisect.bisect_left(self.myMap[last_val],idx_to_remove)
                    self.myMap[last_val].insert(insert_pos,idx_to_remove)

                    self.stack[idx_to_remove],self.stack[-1] = self.stack[-1],self.stack[idx_to_remove]
                    self.stack.pop()
                else:
                    self.stack.pop()
                    tmp.pop()
                return True
            else:
                return False


        def getRandom(self):
            """
            Get a random element from the collection.
            :rtype: int
            """
            return random.choice(self.stack)

以下是使用 Sets 的类似代码。我不确定为什么这会起作用。

from collections import defaultdict
import random

class RandomizedCollection:

    def __init__(self):
        """
        Initialize your data structure here.
        """
        self.nums = []
        self.num_map = defaultdict(set)


    def insert(self, val):
        """
        Inserts a value to the collection. Returns true if the collection did not already contain the specified element.
        :type val: int
        :rtype: bool
        """
        self.nums.append(val)
        self.num_map[val].add(len(self.nums) - 1)
        return True


    def remove(self, val):
        """
        Removes a value from the collection. Returns true if the collection contained the specified element.
        :type val: int
        :rtype: bool
        """
        if len(self.num_map[val]) == 0:
            return False
        index = self.num_map[val].pop()
        last_index = len(self.nums) - 1
        if not (index == last_index):
            last_val = self.nums[last_index]
            self.nums[index] = last_val
            self.num_map[last_val].remove(last_index)
            self.num_map[last_val].add(index)
        self.nums.pop()
        return True


    def getRandom(self):
        """
        Get a random element from the collection.
        :rtype: int
        """
        return self.nums[random.randint(0, len(self.nums) - 1)]

最佳答案

python 中的集合无法保证顺序。事实上，不维护秩序是很常见的。例如，在我的 Python-2.7 和 3.5.2 实现上:

>>> a = set([3,10,19])
>>> a
set([19, 10, 3])
>>> a.add(1)
>>> a
set([19, 1, 10, 3])
>>> a.pop()
19
>>> a.pop()
1
>>> a
set([10, 3])

有时使用 set 维护订单因为这是哈希表的工作方式。通常是 set以具有 size=a*len(hash_table) 的哈希表开始水桶。元素 value被插入桶号 value % size .对于小而密集的整数 value < size .这意味着在这种情况下 value插入桶号 value ，这是值的排序顺序。

这是为了表明当 set 时应该不足为奇在某些情况下保持排序顺序，但这不是重点。重点是为什么 RandomizedCollection类(class)作品。

实际上，RandomizedCollection 的两个实现具有相同的概念数据结构。 self.stack基于列表的变体的读取和更新方式与 self.nums 完全相同的 set基于变体。它们都维护了 RandomizedCollection 中所有元素的平面列表。目的。

基于列表的实现使用了 self.myMap[val] 的事实是仅在 remove() 中的排序列表在:

last_val = self.stack[-1]
self.myMap[last_val].pop() ## removes the last index

自 self.myMap[last_val]是一个排序的索引列表，它的最后一个元素必须是最后一个 last_val self.stack 中的元素.自 last_val取自 self.stack 的末尾, 然后是 self.myMap[last_val] 的最后一个元素保证指向 len(self.stack)-1) .这意味着弹出两个 self.stack 的最后一个元素和 self.myMap[last_val]将保持数据结构一致。除了以上两行，制作self.myMap[val]排序列表只会产生成本。在列表中搜索索引是O(log K)插入是 O(K) .

set基于的解决方案适用于不同的顺序。而不是通过以下方式删除指向 O(1) 中堆栈末尾的索引:

self.myMap[last_val].pop() # (Sorted list variant)

它使用 set 的 O(1) 能力删除相同的元素:

self.num_map[last_val].remove(last_index)

效果是一样的。在这两种情况下，映射不再指向堆栈中的最后一个元素。不需要 set要排序，只能在 O(1) 中删除元素。最终，堆栈中的最后一个元素将被一个简单的 self.stack.pop() 删除。或 self.nums.pop() .

有时不是必须删除的最后一个元素，但代码还是删除了它。这个想法是然后删除 正确的 元素，并将“错误”删除的元素放在它的位置。排序列表的解决方案在 O(K) 中努力工作以重新插入删除的元素:

insert_pos = bisect.bisect_left(self.myMap[last_val],idx_to_remove)
self.myMap[last_val].insert(insert_pos,idx_to_remove)

在 set 的情况下这很简单，因为不需要顺序:

self.num_map[last_val].add(index)

最后，包含实际值(self.nums 和 self.stack)的数组被更新，以便最后一个元素最终被删除，并重新插入而不是首先应该被删除的元素。

总而言之，两种数据结构是等价的。在一种情况下，索引集作为排序列表维护，在另一种情况下作为 set 维护。 .排序列表变体中的代码有时会利用列表已排序的事实，以便在 O(1) 中删除最大的元素。但是，对于 set如果不需要这样的优化，因为删除任何元素都是 O(1)，并且 remove()可以用来代替 pop() .

关于python-3.x - 在 Python 中使用 Sets Insert Delete Get Random O(1) 顺序算法的工作，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/52012128/

python-3.x - 在 Python 中使用 Sets Insert Delete Get Random O(1) 顺序算法的工作

上一篇：java - 在 Java 中使用归并排序对链表进行排序

下一篇：algorithm - 给定一个包含两个整数作为节点的查询，在树中找到这两个节点的所有子节点？