python - Trie 实现的内存高效数据结构

我正在用 python 实现一个 Trie。到目前为止，我遇到了两种不同的实现方法:

1) 使用带有数据成员的 Node 类(类似于 C++ 中的 struct Node):

char - 存储字符
is_end - 存储单词结尾(真或假)
prefix_count - 存储当前前缀的单词数

child - 节点类型字典(用于存储其他节点，即 26 个字母表)

class Node(object):
    def __init__(self):
        self.char = ''
        self.word = ''
        self.is_end = False
        self.prefix_count = 0
        self.child = {}

2) 使用字典存储所有数据:

例如对于输入 words = {'foo', 'bar', 'baz', 'barz'}

我们导出这个字典:

         {'b': {'a': {'r': {'_end_': '_end_', 'z': {'_end_': '_end_'}},
          'z': {'_end_': '_end_'}}},
          'f': {'o': {'o': {'_end_': '_end_'}}}}

对于大型单词数据集的遍历和其他 trie 操作，哪种高效且标准的数据结构既节省内存又快速？

最佳答案

为什么不两者兼而有之？就在昨天，我正在实现一个类似的数据结构来存储和检索对象的层次结构，并考虑了这种确切的情况。最终使用带有子字典的 Node 对象。作为对象的主节点允许您使用自定义方法来打印它或获取东西，如果需要，您甚至可以进行延迟初始化(您提到了大数据集，对吧？)

class Node(object):
    def __init__(self):
        self.children = collections.defaultdict(lambda:self.__class__())
        self.stuff = ...
    def __str__(self,depth=0):
        if self.children:
            return str(self.stuff)+'\n'+'\n'.join([' '*depth+k+' ' + v.__str__(depth+1) for k,v in self.children.items()])
        else:
            return str(self.stuff)
    def get_path(self,path):
        node = self
        while path:
            node = node.children[path.pop(0)]
        ...

关于python - Trie 实现的内存高效数据结构，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/29734260/

python - Trie 实现的内存高效数据结构

1) 使用带有数据成员的 Node 类(类似于 C++ 中的 struct Node):

2) 使用字典存储所有数据:

上一篇：python - 使用 mechanize 将文件上传到 Sharepoint 失败，我不明白为什么

下一篇：Python:更新heapq中元素的值