具有以下字符串:
commit a8c11fcee68881dfb86095aa36290fb304047cf1
log size 110
Author: XXXXXX XXXXXXXX <<a href="https://stackoverflow.com/cdn-cgi/l/email-protection" class="__cf_email__" data-cfemail="1b4343434343434343434343434343435b434343434335434343" rel="noreferrer noopener nofollow">[email protected]</a>>
Date: Tue, 10 Apr 2012 11:19:44 +0300
First commit
3 0 README.MD
如何在语法定义中使用值 110
来匹配其余内容? “日志大小”包括字段(此处:Author
和 Date
,但可以有任意数量的字段)和实际消息。
最后一行不是“日志消息”的一部分。
我想要获取的是commit
的值、包含Author
和Date
等元数据的字典以及实际的日志消息,这里是“第一次提交”。
问题是,日志大小
告诉我这条消息有多长,但这也包括字段作者
和日期
。
110
是该字符串的大小:
Author: XXXXXX XXXXXXXX <<a href="https://stackoverflow.com/cdn-cgi/l/email-protection" class="__cf_email__" data-cfemail="6b3333333333333333333333333333332b333333333345333333" rel="noreferrer noopener nofollow">[email protected]</a>>
Date: Tue, 10 Apr 2012 11:19:44 +0300
First commit
最佳答案
我的算法思路和NPE一样。
但我进一步插入了正则表达式的使用。
我用第二次出现的日志消息扩展了分析的文本,并注意在“日志大小 xxx\n”行中放置正确数量的字符
regex1 将每个出现的情况分为 4 组。第三组包含具有字典的行,第四组包含字典行之后和其他出现之前的尾随行。
import re
ss = """commit a8c11fcee68881dfb86095aa36290fb304047cf1
log size 110
Author: XXXXXX XXXXXXXX <<a href="https://stackoverflow.com/cdn-cgi/l/email-protection" class="__cf_email__" data-cfemail="6d3535353535353535353535353535352d353535353543353535" rel="noreferrer noopener nofollow">[email protected]</a>>
Date: Tue, 10 Apr 2012 11:19:44 +0300
First commit
3 0 README.MD
blablah bla
commit 12458777AFDRE1254
log size 170
Author: Jim Bluefish <<a href="https://stackoverflow.com/cdn-cgi/l/email-protection" class="__cf_email__" data-cfemail="583231353a343e2b30183f35393134763b3735" rel="noreferrer noopener nofollow">[email protected]</a>>
Date : Yesterday 21:45:01 +0800
A key with whitespace : A_stupid_value
Funny commit
From far from you
457 popo not_README.MD"""
n = 0
print ('------ DISPLAY OF THE TEXT ------\n'
' col 1: index of line,\n'
' col 2: number of chars in the line\n'
' col 3: total of the numbers of chars of lines\n'
' col 4: repr(line)\n')
for j,line in enumerate(ss.splitlines(1)):
n += len(line)
print '%2d %2d %3d %r' % (j,len(line),n,line)
print '=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-='
print '\n\n\n------ ANALYSER 2 OF THE TEXT ------'
regx1 = re.compile('^commit +(.+) *\r?\n'
'log size +(\d+) *\r?\n'
'((?:^ *.+?(?<! ) *: *.+(?<! ) *\r?\n)+)'
'((?:.*\r?\n(?!commit))+)',
re.MULTILINE)
regx2 = re.compile('^ *(.+?)(?<! ) *: *(.+)(?<! ) *\r?\n',
re.MULTILINE)
for mat in regx1.finditer(ss):
commit_value,logsize,dicolines,msg = mat.groups()
print ('\ncommit_value == %s\n'
'logsize == %s'
% (commit_value,logsize))
print 'dictionary :\n',dict(regx2.findall(dicolines))
actual_log_message = msg[0:int(logsize)-len(dicolines)].strip(' \r\n')
print 'actual_log_message ==',repr(actual_log_message)
结果
------ DISPLAY OF THE TEXT ------
col 1: index of line,
col 2: number of chars in the line
col 3: total of the numbers of chars of lines
col 4: repr(line)
0 48 48 'commit a8c11fcee68881dfb86095aa36290fb304047cf1\n'
1 13 61 'log size 110\n'
2 52 113 'Author: XXXXXX XXXXXXXX <<a href="https://stackoverflow.com/cdn-cgi/l/email-protection" class="__cf_email__" data-cfemail="eeb6b6b6b6b6b6b6b6b6b6b6b6b6b6b6aeb6b6b6b6b6c0b6b6b6" rel="noreferrer noopener nofollow">[email protected]</a>>\n'
3 40 153 'Date: Tue, 10 Apr 2012 11:19:44 +0300\n'
4 1 154 '\n'
5 17 171 ' First commit\n'
6 26 197 '3 0 README.MD\n'
7 12 209 'blablah bla\n'
8 25 234 'commit 12458777AFDRE1254\n'
9 13 247 'log size 170\n'
10 45 292 ' Author: Jim Bluefish <<a href="https://stackoverflow.com/cdn-cgi/l/email-protection" class="__cf_email__" data-cfemail="e68c8f8b848a80958ea6818b878f8ac885898b" rel="noreferrer noopener nofollow">[email protected]</a>>\n'
11 36 328 'Date : Yesterday 21:45:01 +0800\n'
12 51 379 ' A key with whitespace : A_stupid_value \n'
13 1 380 '\n'
14 17 397 ' Funny commit\n'
15 20 417 ' From far from you\n'
16 33 450 '457 popo not_README.MD'
------ ANALYSER OF THE TEXT ------
commit_value == a8c11fcee68881dfb86095aa36290fb304047cf1
logsize == 110
dico :
{'Date': 'Tue, 10 Apr 2012 11:19:44 +0300', 'Author': 'XXXXXX XXXXXXXX <<a href="https://stackoverflow.com/cdn-cgi/l/email-protection" class="__cf_email__" data-cfemail="3860606060606060606060606060606078606060606016606060" rel="noreferrer noopener nofollow">[email protected]</a>>'}
actual_log_message == 'First commit'
commit_value == 12458777AFDRE1254
logsize == 170
dico :
{'Date': 'Yesterday 21:45:01 +0800', 'A key with whitespace': 'A_stupid_value', 'Author': 'Jim Bluefish <<a href="https://stackoverflow.com/cdn-cgi/l/email-protection" class="__cf_email__" data-cfemail="63090a0e010f05100b23040e020a0f4d000c0e" rel="noreferrer noopener nofollow">[email protected]</a>>'}
actual_log_message == 'Funny commit\n From far from you'
关于python - 使用匹配字符串中定义的字符串长度,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/13658806/