php - PHP中的正则表达式元字符\w和\pL

标签 php regex

在版本 5.3.4 - 5.5.0beta1 中,\w\pL 是否等价?

 <?php
preg_match_all('#\w#u','سیب',$f);
var_dump($f);

preg_match_all('#\pL#u','سیب',$f);
var_dump($f);

array(1) {
  [0]=>
  array(3) {
    [0]=>
    string(2) "س"
    [1]=>
    string(2) "ی"
    [2]=>
    string(2) "ب"
  }
}
array(1) {
  [0]=>
  array(3) {
    [0]=>
    string(2) "س"
    [1]=>
    string(2) "ی"
    [2]=>
    string(2) "ب"
  }
}

Try the above snippet in the Online PHP shell

最佳答案

看起来当您在 PCRE 正则表达式中使用 u 修饰符时,除了 PCRE_UTF8 标志外,PHP 还设置了 PCRE_UCP 标志,导致 Unicode 属性被引入 \w 和其他 POSIX 字符类,而不仅仅是默认的 ASCII 字符。来自man page on PCRE :

PCRE_UCP

This option changes the way PCRE processes \B, \b, \D, \d, \S, \s, \W, \w, and some of the POSIX character classes. By default, only ASCII characters are recognized, but if PCRE_UCP is set, Unicode properties are used instead to classify characters.

这就是 confirmed in the PHP source code (第 366-372 行),我们在其中看到:

        case 'u':   coptions |= PCRE_UTF8;
/* In  PCRE,  by  default, \d, \D, \s, \S, \w, and \W recognize only ASCII
   characters, even in UTF-8 mode. However, this can be changed by setting
   the PCRE_UCP option. */
#ifdef PCRE_UCP
                    coptions |= PCRE_UCP;
#endif

因此,从我上面链接的同一个手册页,您会看到当 PCRE_UCP 设置时,字符类变为:

\d any character that \p{Nd} matches (decimal digit)

\s any character that \p{Z} matches, plus HT, LF, FF, CR

\w any character that \p{L} or \p{N} matches, plus underscore

关于php - PHP中的正则表达式元字符\w和\pL,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/15696801/

相关文章:

php - NetBeans 总是显示警告 : "The authenticity of host x can' t be established"after upload, 下载或同步

c# - 如何使用正则表达式从SQL语句中提取参数名称

python - 从 csv 返回 python 中的多个值

php - filter_var 和 filter_input 在输入数据验证方面的区别

php - Facebook 评论框已过时

php - 个人服务器无法登录

php - 高效使用mysql Table缓存复杂查询

regex - 如何在多行文本 block 上使用 Xcode 的 `Find and Replace`

ruby 正则表达式在字符串中查找前两个数字

python - Python 正则表达式中可变宽度回顾的替代方案