python - 使用 Python 实现用于字符串匹配的 Knuth-Morris-Pratt (KMP) 算法

标签 python algorithm string-matching knuth-morris-pratt

我正在关注 Cormen Leiserson Rivest Stein (clrs) 的书,并遇到了用于字符串匹配的“kmp 算法”。我使用 Python(按原样)实现了它。

但是,由于某些原因,它似乎不起作用。哪里是我的错?

代码如下:

def kmp_matcher(t,p):
    n=len(t)
    m=len(p)
    # pi=[0]*n;
    pi = compute_prefix_function(p)
    q=-1
    for i in range(n):
        while(q>0 and p[q]!=t[i]):
            q=pi[q]
        if(p[q]==t[i]):
            q=q+1
        if(q==m):
            print "pattern occurs with shift "+str(i-m)
            q=pi[q]


def compute_prefix_function(p):
    m=len(p)
    pi =range(m)
    pi[1]=0
    k=0
    for q in range(2,m):
        while(k>0 and p[k]!=p[q]):
            k=pi[k]
        if(p[k]==p[q]):
            k=k+1
        pi[q]=k
    return pi

t = 'brownfoxlazydog'
p = 'lazy'
kmp_matcher(t,p)

最佳答案

这是我基于CLRs KMP算法写的一个类,里面有你要的。请注意,此处只接受 DNA“字符”。

class KmpMatcher(object):
def __init__(self, pattern, string, stringName):
    self.motif = pattern.upper()
    self.seq = string.upper()
    self.header = stringName
    self.prefix = []
    self.validBases = ['A', 'T', 'G', 'C', 'N']

#Matches the motif pattern against itself.
def computePrefix(self):
    #Initialize prefix array
    self.fillPrefixList()
    k = 0

    for pos in range(1, len(self.motif)):
        #Check valid nt
        if(self.motif[pos] not in self.validBases):
            self.invalidMotif()

        #Unique base in motif
        while(k > 0 and self.motif[k] != self.motif[pos]):
            k = self.prefix[k]
        #repeat in motif
        if(self.motif[k] == self.motif[pos]):
            k += 1

        self.prefix[pos] = k

#Initialize the prefix list and set first element to 0
def fillPrefixList(self):
    self.prefix = [None] * len(self.motif)
    self.prefix[0] = 0

#An implementation of the Knuth-Morris-Pratt algorithm for linear time string matching
def kmpSearch(self):
    #Compute prefix array
    self.computePrefix()
    #Number of characters matched
    match = 0
    found = False

    for pos in range(0, len(self.seq)):
        #Check valid nt
        if(self.seq[pos] not in self.validBases):
            self.invalidSequence()

        #Next character is not a match
        while(match > 0 and self.motif[match] != self.seq[pos]):
            match = self.prefix[match-1]
        #A character match has been found
        if(self.motif[match] == self.seq[pos]):
            match += 1
        #Motif found
        if(match == len(self.motif)):
            print(self.header)
            print("Match found at position: " + str(pos-match+2) + ':' + str(pos+1))
            found = True
            match = self.prefix[match-1]

    if(found == False):
        print("Sorry '" + self.motif + "'" + " was not found in " + str(self.header))

#An invalid character in the motif message to the user
def invalidMotif(self):
    print("Error: motif contains invalid DNA nucleotides")
    exit()

#An invalid character in the sequence message to the user
def invalidSequence(self):
    print("Error: " + str(self.header) + "sequence contains invalid DNA nucleotides")
    exit()

关于python - 使用 Python 实现用于字符串匹配的 Knuth-Morris-Pratt (KMP) 算法,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/37840651/

相关文章:

python - 如果 Python GTK 中的内容发生更改,则滚动到 Scrollable 的底部

python - 仅使用 Tensorflow 进行目标检测的区域提案网络训练

python - pip install tensorflow-gpu 在 python 3.5 中安装

javascript - 难以在 JavaScript 中实现简化的借用检查器

c# - 字符串操作 : How to replace a string with a specific pattern

php - preg_match 多个单词

python - 绘图中重叠的色阶

c++ - Leetcode 1366:堆缓冲区溢出

c - 插入排序数组

algorithm - 使用 Rabin-Karp 搜索字符串中的多个模式