python findall 包括可选文本

我想从类似于这样的字符串中提取值:

=== START AAA
one: 11 
two: 22
=== START BBB
one: 44
two: 55
three: 66

“三”参数是可选的。我可以逐行解析，但我正在尝试使用 re.findall 来做到这一点。我添加了.*？和 (===|$) ，这样整个字符串就不会被立即消耗掉。我尝试了很多方法，这似乎是最接近的:

stats_re = re.compile('START (\S+).*?one\s*:\s*(\S+).*?two\s*:\s+(\S+).*?(three\s*:\s+(\S+))?(===|$)',re.DOTALL)

这会产生:

('AAA', '11', '22', '', '', '===')
('BBB', '44', '55', '', '', '')

这几乎可以工作，只是当“三”参数出现时我没有得到它的值。

最佳答案

你的正则表达式基本上没问题。你错过了一些东西。
大多数情况下，您必须使用一致的 .*? 来在字段之间进行消费。

需要修改的部分我已经评论了。
这是修改后的正则表达式:

 START[ ]+(\S+).*?one\s*:\s*(\S+).*?two\s*:\s+(\S+).*?(?:three\s*:\s+(\S+).*?)?(===|$)

扩展:

 START [ ]+                    # <- Added '+'
 ( \S+ )                       # (1)
 .*? one \s* : \s* 
 ( \S+ )                       # (2)
 .*? two \s* : \s+ 
 ( \S+ )                       # (3)
 .*? 
 (?:                           # <- Converted to non-capture
      three \s* : \s+ 
      ( \S+ )                       # (4)
      .*?                           # <- Added '.*?'
 )?
 ( === | $ )                   # (5)

输出

 **  Grp 0 -  ( pos 4 , len 33 ) 
START AAA
one: 11 
two: 22
===  
 **  Grp 1 -  ( pos 10 , len 3 ) 
AAA  
 **  Grp 2 -  ( pos 20 , len 2 ) 
11  
 **  Grp 3 -  ( pos 30 , len 2 ) 
22  
 **  Grp 4 -  NULL 
 **  Grp 5 -  ( pos 34 , len 3 ) 
===  

--------------------

 **  Grp 0 -  ( pos 38 , len 39 ) 
START BBB
one: 44
two: 55
three: 66   
 **  Grp 1 -  ( pos 44 , len 3 ) 
BBB  
 **  Grp 2 -  ( pos 54 , len 2 ) 
44  
 **  Grp 3 -  ( pos 63 , len 2 ) 
55  
 **  Grp 4 -  ( pos 74 , len 2 ) 
66  
 **  Grp 5 -  ( pos 77 , len 0 )  EMPTY

关于python findall 包括可选文本，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/27931973/

python findall 包括可选文本

上一篇：python - SelectField 返回值 wtf-forms

下一篇：python - pretty-print 并替换 numpy 数组