有没有一种方法可以使用正则表达式或其他逻辑从字符串中提取名称的一部分。
我想用空格分割名字,但是如果名字有前缀,我想在前缀上分割,例如
Osama bin Laden bin Mohammed => Osama, bin Laden, bin Mohamed
Jorge do Pinto da Silva => Jorge, do Pinto, da Silva
John Andrew Smith => John, Andrew, Smith
José Mário dos Santos Mourinho Félix => José, Mário, dos Santos, Mourinho, Félix
基于 Tim 的建议的工作代码:
$str = 'Manuel D\'Souza do Pinto bin Laden Al-saud el Mecca de la Vere Na Sokakah van Der Reidejin del Monte du Pont ter Johannes';
preg_match_all( '~\b(von der|van de|van den|del la|de la|van der|vande|vanden|vander|st|der|des|dela|della|bin|dos|ur|ibn|bint|da|do|le|la|del|du|de|di|el|al|van|von|ter|na|del|san|los)\s+[^\s]+\b|\b[^\s]+~i', $str, $mat );
print_r( $mat );
结果:
Array(
[0] => Array
(
[0] => Manuel
[1] => D'Souza
[2] => do Pinto
[3] => bin Laden
[4] => Al-saud
[5] => el Mecca
[6] => de la Vere
[7] => Na Sokakah
[8] => van Der Reidejin
[9] => del Monte
[10] => du Pont
[11] => ter Johannes
)
[1] => Array
(
[0] =>
[1] =>
[2] => do
[3] => bin
[4] =>
[5] => el
[6] => de la
[7] => Na
[8] => van Der
[9] => del
[10] => du
[11] => ter
)
)
最佳答案
牢记所有这些 falsehoods programmers believe about names , 你还是可以试试
\b\p{Lu}\p{Ll}*|\b\p{Ll}+\s+\p{Lu}\p{Ll}*
将匹配大写单词(名称)或小写前缀,后跟大写单词。
解释:
\b # Start of word
\p{Lu} # One uppercase letter
\p{Ll}* # Any number of lowercase letters
| # or
\b # Start of word
\p{Ll}+ # One or more lowercase letters
\s+ # Whitespace
\p{Lu} # One uppercase letter
\p{Ll}* # Any number of lowercase letters
关于regex - 提取带有前缀的姓氏的正则表达式,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/23610277/