REGEX - 匹配任何重复n次的字符

如何匹配任何重复n次的字符？

示例:

for input: abcdbcdcdd
for n=1:   ..........
for n=2:    .........
for n=3:     .. .....
for n=4:      .  . ..
for n=5:   no matches

几个小时后我最好的就是这个表达

(\w)(?=(?:.*\1){n-1,}) //where n is variable

它使用前瞻。然而这个表达式的问题是:

for input: abcdbcdcdd
for n=1    .......... 
for n=2     ... .. .
for n=3      ..  .
for n=4       .
for n=5    no matches

如您所见，当先行匹配某个字符时，让我们查找n=4行，d的先行断言满足和第一个 d 由正则表达式匹配。但剩余的 d 不匹配，因为它们前面没有 3 个以上的 d。

我希望我清楚地说明了问题。希望您能提供解决方案，提前致谢。

最佳答案

let's look for n=4 line, d's lookahead assertion satisfied and first d matched by regex. But remaining d's are not matched because they don't have 3 more d's ahead of them.

And obviously, without regex, this is a very simple string manipulation problem. I'm trying to do this with and only with regex.

与任何正则表达式实现一样，答案取决于正则表达式的风格。您可以使用 .net 创建解决方案正则表达式引擎，因为它允许可变宽度lookbehinds。

此外，我将在下面为 Perl 兼容/类似正则表达式风格提供更通用的解决方案。
<小时/>
.net 解决方案

如@PetSerAl pointed out in his answer ，使用可变宽度的lookbehinds，我们可以断言回到字符串的开头，并检查是否有n次出现。
ideone demo

Python 中的正则表达式模块
您可以在 python 中实现此解决方案，使用 regex module by Matthew Barnett ，它还允许可变宽度的lookbehinds。
>>> import regex
>>> regex.findall( r'(\w)(?<=(?=(?>.*?\1){2})\A.*)', 'abcdbcdcdd')
['b', 'c', 'd', 'b', 'c', 'd', 'c', 'd', 'd']
>>> regex.findall( r'(\w)(?<=(?=(?>.*?\1){3})\A.*)', 'abcdbcdcdd')
['c', 'd', 'c', 'd', 'c', 'd', 'd']
>>> regex.findall( r'(\w)(?<=(?=(?>.*?\1){4})\A.*)', 'abcdbcdcdd')
['d', 'd', 'd', 'd']
>>> regex.findall( r'(\w)(?<=(?=(?>.*?\1){5})\A.*)', 'abcdbcdcdd')
[]
<小时/> <小时/>
通用解决方案

在 pcre或任何“类似 perl”的风格，没有任何解决方案能够为每个重复的字符返回一个匹配，但是我们可以创建一个，而且只有一个，捕获> 对于每个字符。

策略

对于任何给定的n，逻辑涉及:

早期匹配:匹配并捕获至少n次出现的每个字符。

最终捕获:

匹配并捕获一个字符，后跟恰好 n-1 次出现，并且

还捕获以下每一个事件。

示例
for n = 3
input = abcdbcdcdd
角色c仅匹配一次(作为决赛)，并且以下 2 次出现也在同一场比赛中C匹配:
abcdbcdcdd
  M  C C
和字符 d (早期)匹配一次:
abcdbcdcdd
   M
并且(最后)再匹配一次，C捕获剩下的:
abcdbcdcdd
      M CC
<小时/>
正则表达式
/(\w)                        # match 1 character
(?:
    (?=(?:.*?\1){≪N≫})     # [1] followed by other ≪N≫ occurrences
  |                          #   OR
    (?=                      # [2] followed by:
        (?:(?!\1).)*(\1)     #      2nd occurence <captured>
        (?:(?!\1).)*(\1)     #      3rd occurence <captured>
        ≪repeat previous≫  #      repeat subpattern (n-1) times
                             #     *exactly (n-1) times*
        (?!.*?\1)            #     not followed by another occurence
    )
)/xg
对于n =

/(\w)(?:(?=(?:.*?\1){2})|(?=(?:(?!\1).)*(\1)(?!.*?\1)))/g
demo

/(\w)(?:(?=(?:.*?\1){3})|(?=(?:(?!\1).)*(\1)(?:(?!\1).)*(\1)(?!.*?\1)))/g
demo

/(\w)(?:(?=(?:.*?\1){4})|(?=(?:(?!\1).)*(\1)(?:(?!\1).)*(\1)(?:(?!\1).)*(\1)(?!.*?\1)))/g
demo

...等等

生成模式的伪代码

// Variables: N (int) character = "(\w)" early_match = "(?=(?:.*?\1){" + N + "})" final_match = "(?=" for i = 1; i < N; i++ final_match += "(?:(?!\1).)*(\1)" final_match += "(?!.*?\1))" pattern = character + "(?:" + early_match + "|" + final_match + ")"
<小时/>
JavaScript 代码

我将使用 javascript 展示一个实现因为我们可以在这里检查结果(如果它在 javascript 中工作，它也可以在任何与 perl 兼容的正则表达式风格中工作，包括 .net 、 java 、 python 、 ruby 、 perl 以及所有支持实现 pcre 等)。

var str = 'abcdbcdcdd'; var pattern, re, match, N, i; var output = ""; // We'll show the results for N = 2, 3 and 4 for (N = 2; N <= 4; N++) { // Generate pattern pattern = "(\\w)(?:(?=(?:.*?\\1){" + N + "})|(?="; for (i = 1; i < N; i++) { pattern += "(?:(?!\\1).)*(\\1)"; } pattern += "(?!.*?\\1)))"; re = new RegExp(pattern, "g"); output += "<h3>N = " + N + "</h3><pre>Pattern: " + pattern + "\nText: " + str; // Loop all matches while ((match = re.exec(str)) !== null) { output += "\nPos: " + match.index + "\tMatch:"; // Loop all captures x = 1; while (match[x] != null) { output += " " + match[x]; x++; } } output += "</pre>"; } document.write(output);

Python3代码

根据OP的要求，我链接到 Python3 implementation in ideone.com

关于REGEX - 匹配任何重复n次的字符，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/33181434/

REGEX - 匹配任何重复n次的字符

.net 解决方案

通用解决方案

策略

示例

正则表达式

生成模式的伪代码

JavaScript 代码

Python3代码

上一篇：java - Java 中的 "implements Runnable"与 "extends Thread"

下一篇：java - Eclipse 上的 Jackson 依赖问题 : java. lang.ClassNotFoundException : com. fastxml.jackson.core.JsonFactory