php - 正则表达式拆分 TitleCase Word

我的正则表达式并不能真正用于在 PHP 中拆分 TitleCase 单词。没有作者的文章不应受到正则表达式的影响。

我当前的正则表达式: From (\S+\s){2}(?<=[a-z])(?=[A-Z])

这是我的 Regex

输入:

From Günther RossmannThis is the article From Harry Gregson-WilliamsAnother article text From Nora WaldstättenSome lorem ipsum stuff From the fantastic architect of the year Text without an author

预期输出:

From Günther Rossman This is the article From Harry Gregson-Williams Another article text From Nora Waldstätten Some lorem ipsum stuff From the fantastic architect of the year Text without an author

最佳答案

使用 {2} 量词，您的模式将扩展为 \S+\s\S+\s，但小写字母和大写字母之间没有空格。

您可以使用

'~From\s+(\S+\s\S+)(?![^\p{Lu}])~u'

请参阅regex demo

详细信息

From - 文字子字符串
\s+ - 1 个以上空格
(\S+\s\S+) - 第 1 组:一个或多个非空白字符、1 个空白字符以及 1 个以上非空白字符
(?![^\p{Lu}]) - 后跟大写字母或字符串结尾。

或者，使用更具体的:

'~From\s+(\p{Lu}\p{Ll}*\s+\p{Lu}\p{Ll}*)~u'

或者，也支持撇号或连字符:

From\h+(\p{Lu}\p{Ll}*(?:[\h-']\p{Lu}\p{Ll}*)*)

参见this regex demo 。这里，\p{Lu} 匹配一个大写字母，\p{Ll}* 匹配 0+ 个小写字母。

请注意，为了更轻松地访问，您甚至可以删除捕获组并使用 \K 运算符来忽略匹配值中迄今为止匹配的文本:

'~From\h+\K\p{Lu}\p{Ll}*(?:[\h-']\p{Lu}\p{Ll}*)*~u'

参见this regex demo .

请注意，在使用 \p{Lu} 等 Unicode 属性类和 Unicode 字符串时，应使用 u 修饰符。

关于php - 正则表达式拆分 TitleCase Word，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/47513726/

php - 正则表达式拆分 TitleCase Word

上一篇：python-3.x - 线程和异步: Task was destroyed but it is pending

下一篇：sas - 带有 VARARGS 的 FCMP 未按预期工作？