java - 匹配完整字符串中包含分隔符的子字符串

标签 java regex pattern-matching match

我不知道如何表达这个问题。长话短说,我想从 In: a (b) 行中提取两个字符串 (a, b)。几乎在所有情况下a=b,但为了以防万一,我将它们分开。问题:两个字符串都可以包含任何字符,包括 Unicode、空格、标点符号和括号。

1: In: ThisName (ThisName) is in this list
2: In: OtherName (With These) (OtherName (With These)) is in this list
3: In: Really Annoying (Because) Separators (Really Annoying (Because) Separators) is in this list

第 1 行,简单:^\w+:\s(?'a'.+?)\s\((?'b'.+)\) a:ThisName b:ThisName

第 2 行,与之前相同:a:OtherName b: WithThese) (OtherName (WithThese)

第 2 行,惰性:^\w+:\s(?'a'.+?)\s\((?'b'.+?)\) a:OtherName b:WithThese

3号线,总台

这可能吗?也许我需要走另一条路?我们知道那里需要一组括号。也许我必须走数学路线,计算括号的数量并找到该路线来确定哪个实际上应该包含 b?以某种方式计算每个打开和关闭的数量。

我一直在玩的东西:https://regex101.com/r/8YIweJ/2

顺便说一句,如果我可以更改输入格式,我肯定会这样做。

添加问题:如果这是不可能的,那么始终假设 a=b 是否会让这变得更容易?我想不出会怎样。

最佳答案

我的评论嵌入在 processInput 方法中。

public static void main(String[] args)
{
    String input = "1: In: ThisName (ThisName) is in this list\n" +
        "2: In: OtherName (With These) (OtherName (With These)) is in this list\n" +
        "3: In: Really Annoying (Because) Separators (Really Annoying (Because) Separators) is in this list\n" +
        "4: In: Not the Same (NotTheSame) is in this list\n" +
        "5: In: A = (B) (A = (B)) is in this list\n" +
        "6: In: A != (B) (A != B) is in this list\n";

    for (String line : input.split("\n"))
    {
        processInput(line);
    }
}


public static void processInput(String line)
{
    // Parse the relevant part from the input.
    Matcher inputPattern = Pattern.compile("(\\d+): In: (.*) is in this list").matcher(line);
    if (!inputPattern.matches())
    {
        System.out.println(line + " is not valid input");
        return;
    }
    String inputNum = inputPattern.group(1);
    String aAndB = inputPattern.group(2);

    // Check if a = b.
    Matcher aEqualsBPattern = Pattern.compile("(.*) \\(\\1\\)").matcher(aAndB);
    if (aEqualsBPattern.matches())
    {
        System.out.println("Input " + inputNum + ":");
        System.out.println("a = b = " + aEqualsBPattern.group(1));
        System.out.println();
        return;
    }

    // Check if a and b have no parentheses.
    Matcher noParenthesesPattern = Pattern.compile("([^()]*) \\(([^()]*)\\)").matcher(aAndB);
    if (noParenthesesPattern.matches())
    {
        System.out.println("Input " + inputNum + ":");
        System.out.println("a = " + noParenthesesPattern.group(1));
        System.out.println("b = " + noParenthesesPattern.group(2));
        System.out.println();
        return;
    }

    // a and b have one or more parentheses in them.
    // All you can do now is guess what a and b are.

    // There is at least one " (" in the string.
    String[] split = aAndB.split(" \\(");
    for (int i = 0; i < split.length - 1; i++)
    {
        System.out.println("Possible Input " + inputNum + ":");
        System.out.println("possible a = " + mergeParts(split, 0, i));
        System.out.println("possible b = " + mergeParts(split, i + 1, split.length - 1));
        System.out.println();
    }
}


private static String mergeParts(String[] aAndBParts, int startIndex, int endIndex)
{
    StringBuilder s = new StringBuilder(getPart(aAndBParts, startIndex));
    for (int j = startIndex + 1; j <= endIndex; j++)
    {
        s.append(" (");
        s.append(getPart(aAndBParts, j));
    }
    return s.toString();
}


private static String getPart(String[] aAndBParts, int j)
{
    if (j != aAndBParts.length - 1)
    {
        return aAndBParts[j];
    }
    return aAndBParts[j].substring(0, aAndBParts[j].length() - 1);
}

执行上述代码输出:

Input 1:
a = b = ThisName

Input 2:
a = b = OtherName (With These)

Input 3:
a = b = Really Annoying (Because) Separators

Input 4:
a = Not the Same
b = NotTheSame

Input 5:
a = b = A = (B)

Possible Input 6:
possible a = A !=
possible b = B) (A != B

Possible Input 6:
possible a = A != (B)
possible b = A != B

关于java - 匹配完整字符串中包含分隔符的子字符串,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/41594662/

相关文章:

python - 在左右范围内搜索短语

arrays - Perl:我可以使用模式匹配来查找日志文件中的某些行吗

clojure - Clojure 中记录的模式匹配

java - 将sitemap.xml写入java webapp根目录权限被拒绝

java - Struts2 触发表在完成时加载

java - 线程未启动?

java - 如何从文档中检索嵌套对象并将其显示在 FirestoreRecyclerOptions 中?

javascript - 在 javascript 中的正则表达式(匹配)中使用变量时出错

php - 在 PHP 中只允许某些字符

java - 如何使用扫描仪和模式类匹配文本中的模式?