java - 正则表达式: match everything up to an optional capture group

标签 java regex

我有以下正则表达式:

(.*)(?:([\+\-\*\/])(-?\d+(?:\.\d+)?))

目的是捕获(左表达式)(运算符)(右操作数)形式的数学表达式,例如1+2+3 将被捕获为 (1+2)(+)(3)。它还将处理单个操作数,例如1+2 将被捕获为 (1)(+)(2)。

我遇到的问题是这个正则表达式在没有运算符的单个操作数上不匹配,例如5 应在第一个捕获组中匹配,而在第二个和第三个 (5)()() 中没有任何内容。如果我将最后一部分设为可选:

(.*)(?:([\+\-\*\/])(-?\d+(?:\.\d+)?))?

那么初始组将始终捕获整个表达式。有什么方法可以使第二部分可选,但让它优先于第一组完成的贪婪匹配?

最佳答案

描述

此正则表达式将:

  • 捕获直到最后一个运算的数学表达式
  • 捕获最后一次操作
  • 捕获数学表达式中的最后一个数字
  • 假设每个数字可能有一个加号或减号来表明该数字是正数还是负数
  • 假设每个数字可能不是整数
  • 假设数学表达式可以包含任意数量的运算,例如:1+21+2+31+2+3+4 1+2+3+4...
  • 验证字符串是否为数学表达式。这里没有考虑一些边缘情况,例如括号的使用或其他复杂的数学符号。

原始正则表达式

请注意,这是 Java,您需要转义此正则表达式中的反斜杠。要转义它们,只需将所有 \ 替换为 \\

^(?=(?:[-+*/^]?[-+]?\d+(?:[.]\d+)?)*$)([-+]?[0 -9.]+$|[-+]?[0-9.]+(?:[-+*/^][-+]?[0-9.]+)*(?=[-+*/^]))(?:([-+*/^])([-+]?[0-9.]+))?$

说明

Regular expression visualization

概述

在此表达式中,我首先验证字符串仅由运算 -+/*^、可选符号 -+ 以及整数或非整数组成。由于已经经过验证,表达式的其余部分可以简单地将数字引用为 [0-9.]+,这提高了可读性。

捕获组

0 获取整个字符串 1 获取整个字符串,但不包括最后一个操作,如果没有操作,则第 1 组将拥有整个字符串 2 获取最后一次操作(如果存在) 3 获取最后一次操作后的数字和符号

NODE                     EXPLANATION
----------------------------------------------------------------------
  ^                        the beginning of the string
----------------------------------------------------------------------
  (?=                      look ahead to see if there is:
----------------------------------------------------------------------
    (?:                      group, but do not capture (0 or more
                             times (matching the most amount
                             possible)):
----------------------------------------------------------------------
      [-+*/^]?                 any character of: '-', '+', '*', '/',
                               '^' (optional (matching the most
                               amount possible))
----------------------------------------------------------------------
      [-+]?                    any character of: '-', '+' (optional
                               (matching the most amount possible))
----------------------------------------------------------------------
      \d+                      digits (0-9) (1 or more times
                               (matching the most amount possible))
----------------------------------------------------------------------
      (?:                      group, but do not capture (optional
                               (matching the most amount possible)):
----------------------------------------------------------------------
        [.]                      any character of: '.'
----------------------------------------------------------------------
        \d+                      digits (0-9) (1 or more times
                                 (matching the most amount possible))
----------------------------------------------------------------------
      )?                       end of grouping
----------------------------------------------------------------------
    )*                       end of grouping
----------------------------------------------------------------------
    $                        before an optional \n, and the end of
                             the string
----------------------------------------------------------------------
  )                        end of look-ahead
----------------------------------------------------------------------
  (                        group and capture to \1:
----------------------------------------------------------------------
    [-+]?                    any character of: '-', '+' (optional
                             (matching the most amount possible))
----------------------------------------------------------------------
    [0-9.]+                  any character of: '0' to '9', '.' (1 or
                             more times (matching the most amount
                             possible))
----------------------------------------------------------------------
    $                        before an optional \n, and the end of
                             the string
----------------------------------------------------------------------
   |                        OR
----------------------------------------------------------------------
    [-+]?                    any character of: '-', '+' (optional
                             (matching the most amount possible))
----------------------------------------------------------------------
    [0-9.]+                  any character of: '0' to '9', '.' (1 or
                             more times (matching the most amount
                             possible))
----------------------------------------------------------------------
    (?:                      group, but do not capture (0 or more
                             times (matching the most amount
                             possible)):
----------------------------------------------------------------------
      [-+*/^]                  any character of: '-', '+', '*', '/',
                               '^'
----------------------------------------------------------------------
      [-+]?                    any character of: '-', '+' (optional
                               (matching the most amount possible))
----------------------------------------------------------------------
      [0-9.]+                  any character of: '0' to '9', '.' (1
                               or more times (matching the most
                               amount possible))
----------------------------------------------------------------------
    )*                       end of grouping
----------------------------------------------------------------------
    (?=                      look ahead to see if there is:
----------------------------------------------------------------------
      [-+*/^]                  any character of: '-', '+', '*', '/',
                               '^'
----------------------------------------------------------------------
    )                        end of look-ahead
----------------------------------------------------------------------
  )                        end of \1
----------------------------------------------------------------------
  (?:                      group, but do not capture (optional
                           (matching the most amount possible)):
----------------------------------------------------------------------
    (                        group and capture to \2:
----------------------------------------------------------------------
      [-+*/^]                  any character of: '-', '+', '*', '/',
                               '^'
----------------------------------------------------------------------
    )                        end of \2
----------------------------------------------------------------------
    (                        group and capture to \3:
----------------------------------------------------------------------
      [-+]?                    any character of: '-', '+' (optional
                               (matching the most amount possible))
----------------------------------------------------------------------
      [0-9.]+                  any character of: '0' to '9', '.' (1
                               or more times (matching the most
                               amount possible))
----------------------------------------------------------------------
    )                        end of \3
----------------------------------------------------------------------
  )?                       end of grouping
----------------------------------------------------------------------
  $                        before an optional \n, and the end of the
                           string
----------------------------------------------------------------------

示例

示例文本

1+2+-3

示例捕获组

[0] = 1+2+-3
[1] = 1+2
[2] = +
[3] = -3

在线演示:http://fiddle.re/b2w5wa

示例文本

-3

示例捕获组

[0] = -3
[1] = -3
[2] = 
[3] = 

在线演示:http://fiddle.re/07kqra

示例 Java 代码

import java.util.regex.Pattern;
import java.util.regex.Matcher;
class Module1{
  public static void main(String[] asd){
  String sourcestring = "source string to match with pattern";
  Pattern re = Pattern.compile("^(?=(?:[-+*/^]?[-+]?\\d+(?:[.]\\d+)?)*$)([-+]?[0-9.]+$|[-+]?[0-9.]+(?:[-+*/^][-+]?[0-9.]+)*(?=[-+*/^]))(?:([-+*/^])([-+]?[0-9.]+))?$",Pattern.CASE_INSENSITIVE);
  Matcher m = re.matcher(sourcestring);
  int mIdx = 0;
    while (m.find()){
      for( int groupIdx = 0; groupIdx < m.groupCount()+1; groupIdx++ ){
        System.out.println( "[" + mIdx + "][" + groupIdx + "] = " + m.group(groupIdx));
      }
      mIdx++;
    }
  }
}

关于java - 正则表达式: match everything up to an optional capture group,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/36828548/

相关文章:

java - Simplify-Ja​​va(通过 hgoebl)问题,减少的点列表总是大小为 2

java - 在 JPA/EclipseLink EntityManager 中更改隔离级别后进行清理

java - 连接到 Tomcat 上的套接字?

c# - 检查字符串是否包含字符和数字

php - PCRE 正则表达式,从包含具有不同分隔符和 strip 注释的多个语句的字符串中提取单个 SQL 语句

Javascript 正则表达式需要 1 到 999999 之间的整数?

javascript - Node.js Express、路由器、可选参数作为扩展

java - Ajax 调用后, session 属性未在 .gsp 页面中更新

java - 为什么我在设置 isSupportingExternalEntities 时收到 SAXNotRecognizedException?

javascript - 为什么我不能将函数直接传递给 replace 方法?