你得到一个像这样的字符串:
input_string = """
HIYourName=this is not true
HIYourName=Have a good day
HIYourName=nope
HIYourName=Bye!"""
找到文件中最常见的子字符串。 这里的答案是“HiYourName=”。 请注意,具有挑战性的部分是 HiYourName= 本身不是字符串中的“单词” 即它不是由它周围的间隔分隔的。
所以,澄清一下,这不是最常见的单词问题。
最佳答案
这是一个简单的暴力解决方案:
from collections import Counter
s = " HIYourName=this is not true HIYourName=Have a good day HIYourName=nope HIYourName=Bye!"
for n in range(1, len(s)):
substr_counter = Counter(s[i: i+n] for i in range(len(s) - n))
phrase, count = substr_counter.most_common(1)[0]
if count == 1: # early out for trivial cases
break
print 'Size: %3d: Occurrences: %3d Phrase: %r' % (n, count, phrase)
示例字符串的输出是:
Size: 1: Occurrences: 10 Phrase: ' '
Size: 2: Occurrences: 4 Phrase: 'Na'
Size: 3: Occurrences: 4 Phrase: 'Nam'
Size: 4: Occurrences: 4 Phrase: 'ourN'
Size: 5: Occurrences: 4 Phrase: 'HIYou'
Size: 6: Occurrences: 4 Phrase: 'IYourN'
Size: 7: Occurrences: 4 Phrase: 'urName='
Size: 8: Occurrences: 4 Phrase: ' HIYourN'
Size: 9: Occurrences: 4 Phrase: 'HIYourNam'
Size: 10: Occurrences: 4 Phrase: ' HIYourNam'
Size: 11: Occurrences: 4 Phrase: ' HIYourName'
Size: 12: Occurrences: 4 Phrase: ' HIYourName='
Size: 13: Occurrences: 2 Phrase: 'e HIYourName='
关于python - 在文件中查找最常见的子字符串模式,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/25071766/