python - 在霍夫曼算法中对中间叶进行编码

标签 python algorithm data-structures huffman-code

我正在使用霍夫曼代码对一组项目进行编码,并且除了最终代码之外,我还想返回编码的中间节点,但将子节点的数据连接到中间节点。

例如,如果我要对这组符号和概率进行编码:

tree = [('a',0.25),('b',0,25),('c',0.25),('d',0.125),('e',0.125)]

我希望返回以下内容:

tree = [['ab','0'],['cde','1'],['a','00'],['b','01'],['c','10'],['de','11'],['d','110'],['e','111']]

我使用以下代码来生成霍夫曼树:

import heapq

#symbfreq = list of symbols and associated frequencies
def encode(symbfreq):
    #create a nested list as a tree
    tree = [[wt, [sym, ""]] for sym, wt in symbfreq]
    #turn the tree into a heap
    heapq.heapify(tree)
    while len(tree)>1:
        #pop the lowest two nodes off the heap, sorted on the length
        lo, hi = sorted([heapq.heappop(tree), heapq.heappop(tree)], key=len)
        #add the next bit of the codeword
        for pair in lo[1:]:
            pair[1] = '0' + pair[1]
        for pair in hi[1:]:
            pair[1] = '1' + pair[1]
        #push a new node, which is the sum of the two lowest probability nodes onto the heap 
        heapq.heappush(tree, [lo[0]+hi[0]] + lo[1:] + hi[1:])
    return sorted(heapq.heappop(tree)[1:], key=lambda p: (len(p[-1]), p))

霍夫曼算法是:

1.Create a leaf node for each symbol and add it to the priority queue.
2.While there is more than one node in the queue:
    3.Remove the node of highest priority (lowest probability) twice to get two nodes.
    4.Create a new internal node with these two nodes as children and with probability equal to the sum of the two nodes' probabilities.
    5.Add the new node to the queue.
6.The remaining node is the root node and the tree is complete.

我一生都无法想出一种方法来阻止中间节点被覆盖(即我想保留在第 4 阶段创建的中间节点)。

最佳答案

我不知道在构建时如何做到这一点(构建永远不会弹出的输出树的一部分),但您可以很容易地检索中间节点:

huffman_tree = encode(tree)
complete_tree = huffman_tree

get_intermediate_node = lambda val, arr :   ''.join( [ char for char,binary in itertools.ifilter( lambda node : node[1].startswith( val ),arr)] ) 

for val in range( next_power_of_two( len(huffman_tree) ) ):
    bvalue = bin(val)[2:] 
    node = [ get_intermediate_node( bvalue , huffman_tree) , bvalue ] 
    if node not in complete_tree:
        complete_tree.append( node)

print sorted( complete_tree , key=lambda p: (len(p[-1]), p) )

>>> [['ab', '0'], ['cde', '1'], ['a', '00'], ['b', '01'], ['c', '10'],
    ['de', '11'], ['', '100'], ['', '101'], ['d', '110'], ['e', '111']]

(您仍然需要修剪空节点)

关于python - 在霍夫曼算法中对中间叶进行编码,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/19766697/

相关文章:

Java:由于字符而从数组中删除项目

c - 当其中一个字段是数组时,如何初始化结构的所有字段?

python - 在Python中使用正则表达式搜索从PDF转换的类(class)

python - 我们可以在 Hadoop Streaming 中级联多个 MapReduce 作业吗(lang : Python)

algorithm - 白平衡算法

algorithm - 如果循环变量除/乘以恒定量,为什么我们将时间复杂度视为 O(Logn)?

c# - 如何将有向无环图 (DAG) 转换为树

python - 如何读取文件夹中的图像并存储其类别号和图像数据? Python

python - python 中的期望值

java - 我自己的快速排序版本似乎只适用于小数组