我尝试处理的文件如下所示:
...
...
15 Apr 2014 22:05 - id: content
15 Apr 2014 22:09 - id: content
15 Apr 2014 22:09 - id: content
with new line
16 Apr 2014 06:56 - id: content
with new line
with new line
16 Apr 2014 06:57 - id: content
16 Apr 2014 06:58 - id: content
...
...
我想出的正则表达式是这样的: \d{1,}[ ][A-Z][a-z]{2}[ ](?:\d{4}[ ]\d{2} [:]\d{2}|\d{2}[:]\d{2}).*
结果是:
这几乎是正确的,我只需要包含换行符,但如果我包含此 [\s\S]*
而不是 .*
则仅返回一个匹配项.
我想提取的是一组子字符串,其中每个字符串从数据序列开始,到下一个日期序列结束,如下所示:
...
...
15 Apr 2014 22:05 - id: content //substring 1
15 Apr 2014 22:09 - id: content //substring 2
15 Apr 2014 22:09 - id: content //substring 3
with new line //substring 3
16 Apr 2014 06:56 - id: content //substring 4
with new line //substring 4
with new line //substring 4
16 Apr 2014 06:57 - id: content //substring 5
16 Apr 2014 06:58 - id: content //substring 6
...
...
对我缺少的东西有什么帮助吗?
最佳答案
您需要使用积极的前瞻断言。
\d{1,}[ ][A-Z][a-z]{2}[ ](?:\d{4}[ ]\d{2}[:]\d{2}|\d{2}[:]\d{2})[\s\S]*?(?:(?!\n\n)[\s\S])*?(?=\n\d{1,}[ ])|\d{1,}[ ][A-Z][a-z]{2}[ ](?:\d{4}[ ]\d{2}[:]\d{2}|\d{2}[:]\d{2}).*
> var str = '...\n...\n15 Apr 2014 22:05 - id: content\n15 Apr 2014 22:09 - id: content\n15 Apr 2014 22:09 - id: content\nwith new line\n16 Apr 2014 06:56 - id: content\nwith new line\nwith new line\n16 Apr 2014 06:57 - id: content\n\n16 Apr 2014 06:58 - id: content\n...\n...';
undefined
> var re = /\d{1,}[ ][A-Z][a-z]{2}[ ](?:\d{4}[ ]\d{2}[:]\d{2}|\d{2}[:]\d{2})[\s\S]*?(?:(?!\n\n)[\s\S])*?(?=\n\d{1,}[ ])|\d{1,}[ ][A-Z][a-z]{2}[ ](?:\d{4}[ ]\d{2}[:]\d{2}|\d{2}[:]\d{2}).*/gm;
undefined
> str.match(re)
[ '15 Apr 2014 22:05 - id: content',
'15 Apr 2014 22:09 - id: content',
'15 Apr 2014 22:09 - id: content\nwith new line',
'16 Apr 2014 06:56 - id: content\nwith new line\nwith new line',
'16 Apr 2014 06:57 - id: content\n',
'16 Apr 2014 06:58 - id: content' ]
关于javascript - 正则表达式任何内容(包括新行)直到一定的顺序 - 多个子字符串 JS,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/28659717/