c++ - boost::regex 与 UTF-8 不区分大小写匹配(例如,大写与小写变音)

标签 c++ regex boost utf-8

在构建支持 Unicode 国际组件 (ICU) 的 boost::regex 1.52 版库后,匹配不区分大小写的正则表达式似乎无法按预期处理大写和小写德语变音字符。

static const std::string pattern("^.*" "\303\226" ".*$");
static const std::string   test1("SCH" "\303\226" "NE");
static const std::string   test2("sch" "\303\266" "ne");
static const boost::regex exp(pattern, boost::regex::icase);
const char *result = (boost::regex_match(test1, exp)) ? "Match" : "NoMatch";
std::cout << "Testing \"" << test1 << "\" against pattern \"" << pattern 
    << "\" : " << result << std::endl;
result = (boost::regex_match(test2, exp)) ? "Match" : "NoMatch";
std::cout << "Testing \"" << test2 << "\" against pattern \"" << pattern 
    << "\" : " << result << std::endl;

产量:

Testing "SCHÖNE" against pattern "^.*Ö.*$" : Match
Testing "schöne" against pattern "^.*Ö.*$" : NoMatch

最佳答案

Working with Unicode and ICU string types .

Example on LWS .

#include <iostream>
#include <boost/regex.hpp>
#include <boost/regex/icu.hpp>
int main()
{
   static const std::string pattern("^.*" "\303\226" ".*$");
   static const std::string   test1("SCH" "\303\226" "NE");
   static const std::string   test2("sch" "\303\266" "ne");
   static const boost::u32regex exp=boost::make_u32regex(pattern, boost::regex::icase);
   const char *result = (boost::u32regex_match(test1, exp)) ? "Match" : "NoMatch";
   std::cout << "Testing \"" << test1 << "\" against pattern \"" << pattern 
      << "\" : " << result << std::endl;
   result = (boost::u32regex_match(test2, exp)) ? "Match" : "NoMatch";
   std::cout << "Testing \"" << test2 << "\" against pattern \"" << pattern 
      << "\" : " << result << std::endl;
}

关于c++ - boost::regex 与 UTF-8 不区分大小写匹配(例如,大写与小写变音),我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/15906136/

相关文章:

c++ - 为什么泛型编程设计更喜欢自由函数而不是成员函数?

c++ - 将缓冲区 boost 为 char*(无 std::string)

c++ - 为什么将 char**(或任何 T**)转换为 void** 无效?

C++ 映射、 vector 、带引用的对

php正则表达式最小和最大长度无法按预期工作

python - 什么是 sre_constants.error : nothing to repeat

javascript - 替换字符串中的数字

c++ - 我可以使用一个变量模板来声明另一个变量模板吗?

c++ - 模板和单独编译

C++在成员变量中保存不同类型数据的优雅方式