我有以下文字:
Attorney General William Barr said the volume of information compromised was “staggering” and the largest breach in U.S. history.“This theft not only caused significant financial damage to Equifax but invaded the privacy of many, millions of Americans and imposed substantial costs and burdens on them as they had to take measures to protect themselves from identity theft,” said Mr. Barr.
我想匹配引用中的文本,但引用的长度必须至少为 5 个单词,否则应被忽略。
目前,我正在使用以下正则表达式:
(?<=[\\“|\\"])[A-Za-z0-9\.\-][A-Za-z\s,:\\’]+(?=[\”|\"])
但是,这将包括引用“staggering”,它只有 1 个单词,因此应被忽略。
我意识到我可以通过重复正则表达式的这一部分 5 次来完成此任务:
[A-Za-z\s,:\\’]+[A-Za-z\s,:\\’]+[A-Za-z\s,:\\’]+[A-Za-z\s,:\\’]+[A-Za-z\s,:\\’]+
但是,我想知道是否有更短、更简洁的方法来实现这一目标?也许通过强制 []
中的 \s
至少出现 5 次?
谢谢
最佳答案
您需要通过取出空白匹配模式来“展开”字符类,并使用 [<chars>]+(?:\s+[<chars>]+){4,}
像图案。请注意,您不应在此处使用环视,因为 "
可以同时是前导标记和尾随标记,这可能会导致不需要的匹配。请改用捕获组并通过 matcher.group(1)
访问其值.
您可以使用
String regex = "[“\"]([A-Za-z0-9.-][A-Za-z,:’]*(?:\\s+[A-Za-z0-9.-][A-Za-z,:’]*){4,})[”\"]";
请参阅regex demo .
然后,只需获取第 1 组值:
String line = "Attorney General William Barr said the volume of information compromised was “staggering” and the largest breach in U.S. history.“This theft not only caused significant financial damage to Equifax but invaded the privacy of many, millions of Americans and imposed substantial costs and burdens on them as they had to take measures to protect themselves from identity theft,” said Mr. Barr.";
String regex = "[“\"]([A-Za-z0-9.-][A-Za-z,:’]*(?:\\s+[A-Za-z0-9.-][A-Za-z,:’]*){4,})[”\"]";
Matcher m = Pattern.compile(regex).matcher(line);
List<String> res = new ArrayList<>();
while(m.find()) {
res.add(m.group(1));
}
System.out.println(res);
请参阅online Java demo .
图案详细信息
-
[“"]
-“
或"
-
([A-Za-z0-9.-][A-Za-z,:’]*(?:\\s+[A-Za-z0-9.-][A-Za-z,:’]*){4,})
- 第 1 组:-
[A-Za-z0-9.-][A-Za-z,:’]*
- ASCII 字母数字或.
或-
然后是 0+ 个 ASCII 字母,
,:
,’
字符 -
(?:\s+[A-Za-z0-9.-][A-Za-z,:’]*){4,}
- 出现四次或以上-
\s+
- 1 个以上空格 -
[A-Za-z0-9.-][A-Za-z,:’]*
- ASCII 字母数字或.
或-
然后是 0+ 个 ASCII 字母,
,:
,’
字符
-
-
-
-
[”"]
-"
或”
关于java - 正则表达式将引用与最小字数匹配,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/60310558/