python - 在python中查找两个字符串的所有公共(public)连续子串

标签 python regex string

<分区>

我有两个字符串,我想找到所有常用词。例如,

s1 = 'Today is a good day, it is a good idea to have a walk.'

s2 = 'Yesterday was not a good day, but today is good, shall we have a walk?'

考虑 s1 匹配 s2

'Today is' 匹配 'today is' 但 'Today is a' 不匹配 s2 中的任何字符。因此,“今天是”是常见的连续字符之一。同样,我们有 'a good day'、'is'、'a good'、'have a walk'。所以常用词是

common = ['today is', 'a good day', 'is', 'a good', 'have a walk']

我们可以使用正则表达式来做到这一点吗?

非常感谢。

最佳答案

import string
s1 = 'Today is a good day, it is a good idea to have a walk.'
s2 = 'Yesterday was not a good day, but today is good, shall we have a walk?'
z=[]
s1=s1.translate(None, string.punctuation) #remove punctuation
s2=s2.translate(None, string.punctuation)
print s1
print s2
sw1=s1.lower().split()                   #split it into words
sw2=s2.lower().split()
print sw1,sw2
i=0
while i<len(sw1):          #two loops to detect common strings. used while so as to change value of i in the loop itself
    x=0
    r=""
    d=i
    #print r
    for j in range(len(sw2)):
        #print r
        if sw1[i]==sw2[j]:
            r=r+' '+sw2[j]                       #if string same keep adding to a variable
            x+=1
            i+=1
        else:
            if x>0:     # if not same check if there is already one in buffer and add it to result (here z)
                z.append(r)
                i=d
                r=""
                x=0
    if x>0:                                            #end case of above loop
        z.append(r)
        r=""
        i=d
        x=0
    i+=1 
    #print i
print list(set(z)) 

#O(n^3)

关于python - 在python中查找两个字符串的所有公共(public)连续子串,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/45911259/

相关文章:

string - 带有希腊字母、换行符和变量值的 R 绘图标题

python - 找不到 kernprof 命令 - Ubuntu

电子邮件域的 Java RegEx

c# 将 console.writeline 转换为字符串

regex - 使用正则表达式验证 IP

mysql - 在mysql中使用正则表达式

Python - 不带单引号的字符串列表

python相当于perl的qw()

python - 在 json 中保存 PIL 图像的最佳方法是什么

python - 套接字.gaierror gaierror : [Errno -2] Name or service not known - pika rabbitMQ