php - 正则表达式回顾问题

我正在尝试编写一个正则表达式，以从我正在构建的项目中保留的历史文件中提取文本 block 。目前，我计划在我的文本编辑器(textmate 或 sublimetext 2)中手动执行此提取，但最终我将使用 python 或 php 将其构建为脚本化过程(尚未决定)。

我的历史文件中的所有历史条目都具有以下格式:

YYYY-MM-DD - Chris -- Version: X.X.X
====================================
- Lorem ipsum dolor sit amet, vim id libris epicuri
- Et eos veri quodsi appetere, an qui saepe malorum eloquentiam.
...

--

其中 X 是完成工作的版本号。

我正在尝试提取从版本号到最后的双破折号分隔符(表示文本 block 的结尾)的所有内容。

我首先创建正则表达式语句来选择有效的节标题:

(^[\d]{4}-[\d]{2}-[\d]{2}\s-\s[\w]+\s--\sVersion:\s)[\d\.]+$

但是当我尝试将括号内的模式转换为后面的外观时，它失败了:

(?<=^[\d]{4}-[\d]{2}-[\d]{2}\s-\s[\w]+\s--\sVersion:\s)[\d\.]+$

我一直在四处寻找，到目前为止，这种回顾格式似乎是正确的。我似乎无法弄清楚我错过了什么。有什么想法吗？

最佳答案

如Joey声明，php 或 python 中没有任意长度的lookbehind。但 PHP 有一个解决方法! \K 转义序列。

来自docs :

The escape sequence \K causes any previously matched characters not to be included in the final matched sequence. For example, the pattern:
   foo\Kbar
matches "foobar", but reports that it has matched "bar". This feature is similar to a lookbehind assertion (described below). However, in this case, the part of the subject before the real match does not have to be of fixed length, as lookbehind assertions do.

删除一些多余的括号[]后，你的表达式将看起来像

(?m)^\d{4}-\d{2}-\d{2}\s-\s\w+\s--\sVersion:\s\K[\d.]+$

Online demo

注释:

(?m) :是内联 regex modifier
您不需要在字符类中转义点 . :[.] 将匹配点而不是任何字符
您可以向空白字符添加一些量词:\s* 或 \s+
\w+ 也会匹配下划线 _，因此要排除它，您可以使用 [^\W_]+
正则表达式是太棒了

关于php - 正则表达式回顾问题，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/20179465/

php - 正则表达式回顾问题

上一篇：python - 在基于类的 View 中获得 kwargs 的最佳方法？

下一篇：python - 在字典中组合字典并添加值