我正在使用霍夫曼代码对一组项目进行编码,并且除了最终代码之外,我还想返回编码的中间节点,但将子节点的数据连接到中间节点。
例如,如果我要对这组符号和概率进行编码:
tree = [('a',0.25),('b',0,25),('c',0.25),('d',0.125),('e',0.125)]
我希望返回以下内容:
tree = [['ab','0'],['cde','1'],['a','00'],['b','01'],['c','10'],['de','11'],['d','110'],['e','111']]
我使用以下代码来生成霍夫曼树:
import heapq
#symbfreq = list of symbols and associated frequencies
def encode(symbfreq):
#create a nested list as a tree
tree = [[wt, [sym, ""]] for sym, wt in symbfreq]
#turn the tree into a heap
heapq.heapify(tree)
while len(tree)>1:
#pop the lowest two nodes off the heap, sorted on the length
lo, hi = sorted([heapq.heappop(tree), heapq.heappop(tree)], key=len)
#add the next bit of the codeword
for pair in lo[1:]:
pair[1] = '0' + pair[1]
for pair in hi[1:]:
pair[1] = '1' + pair[1]
#push a new node, which is the sum of the two lowest probability nodes onto the heap
heapq.heappush(tree, [lo[0]+hi[0]] + lo[1:] + hi[1:])
return sorted(heapq.heappop(tree)[1:], key=lambda p: (len(p[-1]), p))
霍夫曼算法是:
1.Create a leaf node for each symbol and add it to the priority queue.
2.While there is more than one node in the queue:
3.Remove the node of highest priority (lowest probability) twice to get two nodes.
4.Create a new internal node with these two nodes as children and with probability equal to the sum of the two nodes' probabilities.
5.Add the new node to the queue.
6.The remaining node is the root node and the tree is complete.
我一生都无法想出一种方法来阻止中间节点被覆盖(即我想保留在第 4 阶段创建的中间节点)。
最佳答案
我不知道在构建时如何做到这一点(构建永远不会弹出的输出树的一部分),但您可以很容易地检索中间节点:
huffman_tree = encode(tree)
complete_tree = huffman_tree
get_intermediate_node = lambda val, arr : ''.join( [ char for char,binary in itertools.ifilter( lambda node : node[1].startswith( val ),arr)] )
for val in range( next_power_of_two( len(huffman_tree) ) ):
bvalue = bin(val)[2:]
node = [ get_intermediate_node( bvalue , huffman_tree) , bvalue ]
if node not in complete_tree:
complete_tree.append( node)
print sorted( complete_tree , key=lambda p: (len(p[-1]), p) )
>>> [['ab', '0'], ['cde', '1'], ['a', '00'], ['b', '01'], ['c', '10'],
['de', '11'], ['', '100'], ['', '101'], ['d', '110'], ['e', '111']]
(您仍然需要修剪空节点)
关于python - 在霍夫曼算法中对中间叶进行编码,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/19766697/