C++ 11 正则表达式未按预期返回组

标签 c++ c++11 gcc

我想编写一个正则表达式来解析特殊格式字符串的语法。它应该可以帮助我检测格式错误,并将格式字符串拆分为单独的部分来处理。

但是,尽管我努力尝试,我还是无法让拆分按预期进行。

根据我在文档中读到的内容,“(?: )”语法应该定义一个非分割组,而普通括号表达式“( )”应该定义一个单独返回的子匹配。但事实并非如此。

这是我的代码:

#include <iostream>
#include <regex>
#include <string>

std::string parseCode( std::regex_constants::error_type etype);

int main()
{
   const std::string  regex_str( "(?:([^\\[]+)(\\[[^\\]]*\\])( +|\\n))");
   std::regex  atr;

   std::cout << "regex string = '" << regex_str << "'" << std::endl;

   try
   {
      atr.assign( regex_str);
   } catch (const std::regex_error& e)
   {
      std::cerr << "Error: " << e.what() << "; code: " << parseCode(e.code()) << std::endl;
      exit( EXIT_FAILURE);
   } // end try

   {
      const std::string  title( "First Title[]  Second Title[]   -Third Title[]");
      auto  regex_begin = std::sregex_iterator( title.begin(), title.end(), atr);

      for (std::sregex_iterator i = regex_begin; i != std::sregex_iterator(); ++i)
      {
         std::smatch  match = *i;
         std::cout << "got: '" << match.str() << "'" << std::endl;
      } // end for

      auto  subregex_begin = std::sregex_token_iterator( title.begin(),
         title.end(), atr, -1);

      for (std::sregex_token_iterator i = subregex_begin; i != std::sregex_token_iterator(); ++i)
      {
         std::cout << "got sub: '" << *i << "'" << std::endl;
      } // end for
   } // end scope

}

std::string parseCode( std::regex_constants::error_type etype)
{

   switch (etype)
   {
   case std::regex_constants::error_collate:
       return "error_collate: invalid collating element request";
   case std::regex_constants::error_ctype:
       return "error_ctype: invalid character class";
   case std::regex_constants::error_escape:
       return "error_escape: invalid escape character or trailing escape";
   case std::regex_constants::error_backref:
       return "error_backref: invalid back reference";
   case std::regex_constants::error_brack:
       return "error_brack: mismatched bracket([ or ])";
   case std::regex_constants::error_paren:
       return "error_paren: mismatched parentheses(( or ))";
   case std::regex_constants::error_brace:
       return "error_brace: mismatched brace({ or })";
   case std::regex_constants::error_badbrace:
       return "error_badbrace: invalid range inside a { }";
   case std::regex_constants::error_range:
       return "erro_range: invalid character range(e.g., [z-a])";
   case std::regex_constants::error_space:
       return "error_space: insufficient memory to handle this regular expression";
   case std::regex_constants::error_badrepeat:
       return "error_badrepeat: a repetition character (*, ?, +, or {) was not preceded by a valid regular expression";
   case std::regex_constants::error_complexity:
       return "error_complexity: the requested match is too complex";
   case std::regex_constants::error_stack:
       return "error_stack: insufficient memory to evaluate a match";
   default:
       return "";
   }
}

这是输出:

regex string = '(?:([^\[]+)(\[[^\]]*\])( +))'
got: 'First Title[]  '
got: 'Second Title[]   '
got sub: ''
got sub: ''
got sub: '-Third Title[]'

这就是我想要/期望的:

regex string = '(?:([^\[]+)(\[[^\]]*\])( +))'
got: 'First Title[]  '
got: 'Second Title[]   '
got: '-Third Title[]'
got sub: 'First  Title'
got sub: '[]'
got sub: '  '
got sub: 'Second  Title'
got sub: '[]'
got sub: '  '
got sub: '-Third Title'
got sub: '[]'

我在 RHEL 7.2 上使用 g++ 5.3.1。
在 IdeOne.com 上使用 g++ 6.3 得到了相同的结果:https://ideone.com/dj4Mqf

我做错了什么?

最佳答案

1) 您的正则表达式与最后一部分不匹配,请将其更改为:

  const std::string  regex_str("([^\\[]+)(\\[[^\\]]*\\])(\\s+|\\n|$)");

2) match.str() 返回整个匹配的字符串,要提取匹配的,请使用operator[]:

  std::smatch  match = *i;
  std::cout << "got: 1='" << match[1] << "' 2='" << match[2] << "' 3='" << match[3] << "'" << std::endl;

输出:

regex string = '([^\[]+)(\[[^\]]*\])(\s+|\n|$)'
got: 1='First Title' 2='[]' 3='  '
got: 1='Second Title' 2='[]' 3='   '
got: 1='-Third Title' 2='[]' 3=''

关于C++ 11 正则表达式未按预期返回组,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/47434643/

相关文章:

c++ - OpenCV中两点之间直线的斜率和长度

c++ - 如何在eclipse上调试图像处理项目?

c++ - 如何通过调用类中的函数来创建线程?

c++ - 通过引用传递 const 值

linux - 枚举类型声明的 C++11 编译错误如预期的那样在数字常量之前

c++ - 函数指针和成员函数指针的区别

c++ - 未给出显式整数后缀时,参数 'typing' 的规则是什么?

c - 为什么此语句会在 gcc 中产生链接器错误?

c++ - 如何为 .props 文件中声明的非 native VC 编译器 (GCC) 实现 MSBUILD 文件跟踪功能 (Tracker.exe)?

c++ - Mat OpenCV 断言失败