在版本 5.3.4 - 5.5.0beta1 中,\w
和 \pL
是否等价?
<?php
preg_match_all('#\w#u','سیب',$f);
var_dump($f);
preg_match_all('#\pL#u','سیب',$f);
var_dump($f);
array(1) {
[0]=>
array(3) {
[0]=>
string(2) "س"
[1]=>
string(2) "ی"
[2]=>
string(2) "ب"
}
}
array(1) {
[0]=>
array(3) {
[0]=>
string(2) "س"
[1]=>
string(2) "ی"
[2]=>
string(2) "ب"
}
}
最佳答案
看起来当您在 PCRE 正则表达式中使用 u
修饰符时,除了 PCRE_UTF8
标志外,PHP 还设置了 PCRE_UCP
标志,导致 Unicode 属性被引入 \w
和其他 POSIX 字符类,而不仅仅是默认的 ASCII 字符。来自man page on PCRE :
PCRE_UCP
This option changes the way PCRE processes \B, \b, \D, \d, \S, \s, \W, \w, and some of the POSIX character classes. By default, only ASCII characters are recognized, but if PCRE_UCP is set, Unicode properties are used instead to classify characters.
这就是 confirmed in the PHP source code (第 366-372 行),我们在其中看到:
case 'u': coptions |= PCRE_UTF8;
/* In PCRE, by default, \d, \D, \s, \S, \w, and \W recognize only ASCII
characters, even in UTF-8 mode. However, this can be changed by setting
the PCRE_UCP option. */
#ifdef PCRE_UCP
coptions |= PCRE_UCP;
#endif
因此,从我上面链接的同一个手册页,您会看到当 PCRE_UCP
设置时,字符类变为:
\d any character that \p{Nd} matches (decimal digit)
\s any character that \p{Z} matches, plus HT, LF, FF, CR
\w any character that \p{L} or \p{N} matches, plus underscore
关于php - PHP中的正则表达式元字符\w和\pL,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/15696801/