我想允许两个主要的通配符 ?
和 *
来过滤我的数据。
这是我现在的做法(正如我在许多网站上看到的那样):
public boolean contains(String data, String filter) {
if(data == null || data.isEmpty()) {
return false;
}
String regex = filter.replace(".", "[.]")
.replace("?", ".")
.replace("*", ".*");
return Pattern.matches(regex, data);
}
但我们不应该转义所有其他正则表达式特殊字符,例如 |
或 (
等吗?而且,也许我们可以保留 ?
和 *
是否以 \
开头?例如,类似于:
filter.replaceAll("([$|\\[\\]{}(),.+^-])", "\\\\$1") // 1. escape regex special chars, but ?, * and \
.replaceAll("([^\\\\]|^)\\?", "$1.") // 2. replace any ? that isn't preceded by a \ by .
.replaceAll("([^\\\\]|^)\\*", "$1.*") // 3. replace any * that isn't preceded by a \ by .*
.replaceAll("\\\\([^?*]|$)", "\\\\\\\\$1"); // 4. replace any \ that isn't followed by a ? or a * (possibly due to step 2 and 3) by \\
你怎么看?如果您同意,我是否遗漏了任何其他正则表达式特殊字符?
编辑 #1(在考虑了 dan1111 和 m.buettner 的建议之后):
// replace any even number of backslashes by a *
regex = regex.replaceAll("(?<!\\\\)(\\\\\\\\)+(?!\\\\)", "*");
// reduce redundant wildcards that aren't preceded by a \
regex = regex.replaceAll("(?<!\\\\)[?]*[*][*?]+", "*");
// escape regexps special chars, but \, ? and *
regex = regex.replaceAll("([|\\[\\]{}(),.^$+-])", "\\\\$1");
// replace ? that aren't preceded by a \ by .
regex = regex.replaceAll("(?<!\\\\)[?]", ".");
// replace * that aren't preceded by a \ by .*
regex = regex.replaceAll("(?<!\\\\)[*]", ".*");
这个怎么样?
编辑 #2(在考虑了 dan1111 的建议之后):
// replace any even number of backslashes by a *
regex = regex.replaceAll("(?<!\\\\)(\\\\\\\\)+(?!\\\\)", "*");
// reduce redundant wildcards that aren't preceded by a \
regex = regex.replaceAll("(?<!\\\\)[?]*[*][*?]+", "*");
// escape regexps special chars (if not already escaped by user), but \, ? and *
regex = regex.replaceAll("(?<!\\\\)([|\\[\\]{}(),.^$+-])", "\\\\$1");
// replace ? that aren't preceded by a \ by .
regex = regex.replaceAll("(?<!\\\\)[?]", ".");
// replace * that aren't preceded by a \ by .*
regex = regex.replaceAll("(?<!\\\\)[*]", ".*");
目标在望?
最佳答案
您不需要在替换字符串中使用 4 个反斜杠来写出一个。两个反斜杠就足够了。
并且您可以通过使用负向后视来避免替换字符串中的 ([^\\\\]|^)
和 $1
:
filter.replaceAll("([$|\\[\\]{}(),.+^-])", "\\$1") // 1. escape regex special chars, but ?, * and \
.replaceAll("(?<!\\\\)[?]", ".") // 2. replace any ? that isn't preceded by a \ by .
.replaceAll("(?<!\\\\)[*]", ".*") // 3. replace any * that isn't preceded by a \ by .*
我真的不明白你需要最后一步做什么。这不会逃避转义元字符的反斜杠(反过来,实际上并没有转义它们)。我忽略了一个事实,即您的替换调用会写出 4 个反斜杠而不是两个。但是假设您的原始输入有 th|is
。然后您的第一个替换将成为 th\|is
。然后最后一个替换将使 th\\|is
匹配 th
-反斜杠 或 is
。
您需要区分字符串在代码中的外观(未编译,反斜杠数量翻倍)和编译后的外观(仅包含一半的反斜杠)。
您可能还想考虑限制可能的 *
的数量。正则表达式,如 .*.*.*.*.*.*.*.*.*.*.*.*.*.*.*.*.*.*.*.*!
(其中 !
在输入中找不到)可能需要很长时间才能运行。该问题称为 catastrophic backtracking .
关于java - 从通配符到正则表达式,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/13862667/