python - 用于解码 RNN 输出的波束搜索算法

标签 python algorithm speech-recognition beam-search

我一直在努力理解解码部分在自动语音识别中使用的波束搜索算法的逻辑。我试图关注的论文是 First-Pass Large Vocabulary Continuous Speech Recognition using Bi-Directional Recurrent DNNs , Lexicon-Free Conversational Speech Recognition with Neural NetworksTowards End-to-End Speech Recognition with Recurrent Neural Networks .问题是算法背后的思想并不那么容易理解,论文中提供的伪代码中有很多拼写错误。另外,this implementation从第二篇论文开始很难理解,this one ,从提到的最后一篇论文中,不包括语言模型。

这是我在 Python 中的实现,由于缺少一些概率而失败了:

class BeamSearch(object):
"""
Decoder for audio to text.

From: https://arxiv.org/pdf/1408.2873.pdf (hardcoded)
"""
def __init__(self, alphabet='" abcdefghijklmnopqrstuvwxyz'):
    # blank symbol plus alphabet
    self.alphabet = '-' + alphabet
    # index of each char
    self.char_to_index = {c: i for i, c in enumerate(self.alphabet)}

def decode(self, probs, k=100):
    """
    Decoder.

    :param probs: matrix of size Windows X AlphaLength
    :param k: beam size
    :returns: most probable prefix in A_prev
    """
    # List of prefixs, initialized with empty char
    A_prev = ['']
    # Probability of a prefix at windows time t to ending in blank
    p_b = {('', 0): 1.0}
    # Probability of a prefix at windows time t to not ending in blank
    p_nb = {('', 0): 0.0}

    # for each time window t
    for t in range(1, probs.shape[0] + 1):
        A_new = []
        # for each prefix
        for s in Z:
            for c in self.alphabet:
                if c == '-':
                    p_b[(s, t)] = probs[t-1][self.char_to_index[self.blank]] *\
                                    (p_b[(s, t-1)] +\
                                        p_nb[(s, t-1)])
                    A_new.append(s)
                else:
                    s_new = s + c
                    # repeated chars
                    if len(s) > 0 and c == s[-1]:
                        p_nb[(s_new, t)] = probs[t-1][self.char_to_index[c]] *\
                                            p_b[(s, t-1)]
                        p_nb[(s, t)] = probs[t-1][self.char_to_index[c]] *\
                                            p_b[(s, t-1)]
                    # spaces
                    elif c == ' ':
                        p_nb[(s_new, t)] = probs[t-1][self.char_to_index[c]] *\
                                           (p_b[(s, t-1)] +\
                                            p_nb[(s, t-1)])
                    else:
                        p_nb[(s_new, t)] = probs[t-1][self.char_to_index[c]] *\
                                            (p_b[(s, t-1)] +\
                                                p_nb[(s, t-1)])
                        p_nb[(s, t)] = probs[t-1][self.char_to_index[c]] *\
                                            (p_b[(s, t-1)] +\
                                                p_nb[(s, t-1)])
                    if s_new not in A_prev:
                        p_b[(s_new, t)] = probs[t-1][self.char_to_index[self.blank]] *\
                                            (p_b[(s, t-1)] +\
                                                p_nb[(s, t-1)])
                        p_nb[(s_new, t)]  = probs[t-1][self.char_to_index[c]] *\
                                                p_nb[(s, t-1)]
                    A_new.append(s_new)
        A = A_new
        s_probs = map(lambda x: (x, (p_b[(x, t)] + p_nb[(x, t)])*len(x)), A_new)
        xs = sorted(s_probs, key=lambda x: x[1], reverse=True)[:k]
        Z, best_probs = zip(*xs)
    return Z[0], best_probs[0]

任何帮助将不胜感激。

最佳答案


我使用 -inf 初始化实现了波束搜索,还遵循论文 http://proceedings.mlr.press/v32/graves14.pdf 中的 ctc_beam_search 算法。 ...除了字符的 p_b 更新外,它几乎与此类似。算法运行正常...如果存在初始化,即使该算法也可以工作。

A_prev = ['']
p_b[('',0)] = 1
p_nb[('',0)] = 0
for alphabet in alphabets:
    p_b[(alphabet,0)] = -float("inf")
    p_nb[(alphabet,0)] = -float("inf")
for t in range(1,probs.shape[0] +1):
    A_new = []
    for s in A_prev:
        if s!='':
            try:                
                p_nb[(s,t)] = p_nb[(s,t-1)]*probs[t-1][char_map[s[-1:]]]
            except:
                p_nb[(s,t)] = p_nb[(s,t-1)]*probs[t-1][char_map['<SPACE>']]*pW(s)
            if s[:-1] in A_prev:
                p_nb[(s,t)] = p_nb[(s,t)]+pr(probs[t-1],s[-1:],s[:-1],t)
        p_b[(s,t)] = (p_nb[(s,t-1)]+p_b[(s,t-1)])*probs[t-1][0]
        if s=='':
            p_nb[(s,t)] = 0
        if s not in A_new:
            A_new.append(s)
        for c in alphabets:
            s_new = s+c
            p_b[(s_new,t)] = 0
            p_nb[(s_new,t)] = pr(probs[t-1],c,s,t)
            #print s_new,' ',p_nb[(s_new,t)]
            if s_new not in A_new:
                A_new.append(s_new)
    s_probs = map(lambda x: (x,(p_b[(x, t)]+ p_nb[(x, t)])), A_new)

关于python - 用于解码 RNN 输出的波束搜索算法,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/42793080/

相关文章:

algorithm - 通过翻转单元格将二进制矩阵转换为零矩阵的贪心算法

python - 从噪声数据中插入连续曲线

python - 如何以 "\r\n"结尾 JSON 文件?

计算包含 1 的子集的数量

algorithm - 在常数时间内找到折线上最近的 2d 点

swift - 将 Microsoft Cognitive SpeechSDK 框架集成到 Swift 应用程序中

c++ - 如何使用一些完全用 C\C++ 编程语言编写的库将 pcm 音频转换为文本?

android - 语音语音识别android删除谷歌屏幕和麦克风按钮

python - dev_appserver 无法在 Ubuntu docker 容器上启动(ValidationError : Value 'None' for application does not match expression)

python - 更新多个文件中的全局字典