python - 用更好的结构简化 for-if 困惑?

标签 python markov-chains

<分区>

请将此问题移至 Code Review -area .它更适合那里,因为我知道下面的代码是垃圾,我想要重要的反馈来完成重写。我几乎是在重新发明轮子。

# Description: you are given a bitwise pattern and a string
# you need to find the number of times the pattern matches in the string.
# The pattern is determined by markov chain.
# For simplicity, suppose the ones and zeros as unbiased coin flipping
# that stops as it hits the pattern, below.
#
# Any one liner or simple pythonic solution?

import random

def matchIt(yourString, yourPattern):
        """find the number of times yourPattern occurs in yourString"""

        count = 0
        matchTimes = 0

        # How can you simplify the for-if structures?
        # THIS IS AN EXAMPLE HOW NOT TO DO IT, hence Code-Smell-label
        # please, read clarifications in [Update]

        for coin in yourString:
            #return to base
            if  count == len(pattern):
                    matchTimes = matchTimes + 1
                    count = 0

            #special case to return to 2, there could be more this type of conditions
            #so this type of if-conditionals are screaming for a havoc
            if count == 2 and pattern[count] == 1:
                    count = count - 1

            #the work horse
            #it could be simpler by breaking the intial string of lenght 'l'
            #to blocks of pattern-length, the number of them is 'l - len(pattern)-1'
            if coin == pattern[count]:
                    count=count+1

        average = len(yourString)/matchTimes

        return [average, matchTimes]



# Generates the list
myString =[]
for x in range(10000):
    myString= myString + [int(random.random()*2)]

pattern = [1,0,0]
result = matchIt(myString, pattern)

print("The sample had "+str(result[1])+" matches and its size was "+str(len(myString))+".\n" +
        "So it took "+str(result[0])+" steps in average.\n" +
        "RESULT: "+str([a for a in "FAILURE" if result[0] != 8]))


# Sample Output
# 
# The sample had 1656 matches and its size was 10000.
# So it took 6 steps in average.
# RESULT: ['F', 'A', 'I', 'L', 'U', 'R', 'E']

[更新]

我会在这里解释一下理论,也许这样可以简化问题。上面的代码尝试用下面的转移矩阵 A 构造马尔可夫链。您可以想象为掷硬币的模式 100 对应于它。

>>> Q=numpy.matrix('0.5 0.5 0; 0 0.5 0.5; 0 0.5 0')     
>>> I=numpy.identity(3)
>>> I
array([[ 1.,  0.,  0.],
       [ 0.,  1.,  0.],
       [ 0.,  0.,  1.]])
>>> Q
matrix([[ 0.5,  0.5,  0. ],
        [ 0. ,  0.5,  0.5],
        [ 0. ,  0.5,  0. ]])
>>> A=numpy.matrix('0.5 0.5 0 0; 0 0.5 0.5 0; 0 0.5 0 0.5; 0 0 0 1')
>>> A
matrix([[ 0.5,  0.5,  0. ,  0. ],
        [ 0. ,  0.5,  0.5,  0. ],
        [ 0. ,  0.5,  0. ,  0.5],
        [ 0. ,  0. ,  0. ,  1. ]])

问题中的average 8 成为矩阵N=(I-Q)^-1 中第一行值的总和其中Q在上面。

>>> (I-Q)**-1
matrix([[ 2.,  4.,  2.],
        [ 0.,  4.,  2.],
        [ 0.,  2.,  2.]])
>>> numpy.sum(((I-Q)**-1)[0])
8.0

现在,您可能会看到这个显然只是模式匹配的问题变成了一个马尔可夫链。我看不出为什么你不能用类似于矩阵或矩阵的东西来代替困惑的 for-while-if 条件。我不知道如何实现它们,但迭代器可能是一种研究方法,特别是对于需要分解的更多状态。

但是Numpy出现了一个问题,-InfNaN是干什么用的?从上面的 (I-Q)**-1 矩阵中检查它们应该收敛到的值。 N 来自 N=I+Q+Q^2+Q^3+...=\frac{I-Q^{n}}{I-Q}

>>> (I-Q**99)/(I-Q)
matrix([[  2.00000000e+00,   1.80853571e-09,             -Inf],
        [             NaN,   2.00000000e+00,   6.90799171e-10],
        [             NaN,   6.90799171e-10,   1.00000000e+00]])
>>> (I-Q**10)/(I-Q)
matrix([[ 1.99804688,  0.27929688,        -Inf],
        [        NaN,  1.82617188,  0.10742188],
        [        NaN,  0.10742188,  0.96679688]])

最佳答案

def matchIt(yourString, yourPattern):
        """find the number of times yourPattern occurs in yourString"""

您可以使用以下内容吗?

yourString.count(yourPattern)

在您的情况下,您可以将 myString 创建为 10000 个字符的真实字符串,并将 pattern 也创建为字符串,然后以简单的 pythonic 方式计算出现次数。

编辑

一个单行代码,为您提供 text(可以是字符串或列表)中 pattern 的(重叠)出现次数,可能看起来像这个:

nbOccurences = sum(1 for i in xrange(len(text)-len(pattern)) if text[i:i+len(pattern)] == pattern)

关于python - 用更好的结构简化 for-if 困惑?,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/4670873/

相关文章:

python - 返回 3D scipy.spatial.Delaunay 的表面三角形

python - 用元组列表填充数据框列

python - 如何从模板调用 Django 函数并将返回值保存在变量中

Python Regex 与第一行不匹配

R : function to generate a mixture distribution

algorithm - J48 和马尔可夫链之间的区别

java - 计算值太大而无法求幂的马尔可夫链概率

hidden-markov-models - 马尔可夫链和隐马尔可夫模型有什么区别?

python - Tensorflow CNN 'tuple' 对象没有属性 'initializer'

java - Google Foobar Challenge Doomsday Fuel 的隐藏测试用例未通过