我有一个家庭作业问题,我需要使用正则表达式来解析大字符串中的子字符串。
目标是选择与以下参数匹配的子字符串:
子字符串以相同的大写字符开头和结尾,我需要忽略任何前面带有数字 0 的大写字符实例。
例如,ZAp0ZuZAuX0AZA
将包含匹配项 ZAp0ZuZ
和 AuX0AZA
我已经弄乱这个几个小时了,老实说还没有接近...
我已经尝试过类似下面的代码,但它会选择从第一个大写字母到最后一个大写字母的所有内容。我也有
[A-Z]{1}[[:alnum:]]*[A-Z]{1} <--- this selects the whole string
[A-Z]{1}[[:alnum:]][A-Z]{1} <--- this gives me strings like ZuZ, AuX
非常感谢任何帮助,我完全被这个难住了。
最佳答案
用正则表达式来做这件事可能不是最好的主意,因为你可以简单地拆分它们。但是,如果您有/希望这样做,this expression当您的字符列表扩展时,可能会让您了解您可能面临的问题:
(?=.[A-Z])([A-Z])(.*?)\1
我添加了必须包含一个大写字母的 (?=.[A-Z])
。您可以删除它,它会起作用。但是,为了安全起见,您可以将此类边界添加到您的表达式中。
JavaScript 测试
const regex = /([A-Z])(.*?)\1/gm;
const str = `ZAp0ZuZAuX0AZA
ZApxxZuZAuXxafaAZA
ZApxaf09xZuZAuX090xafaAZA
abcZApxaf09xZuZAuX090xafaAZA`;
let m;
while ((m = regex.exec(str)) !== null) {
// This is necessary to avoid infinite loops with zero-width matches
if (m.index === regex.lastIndex) {
regex.lastIndex++;
}
// The result can be accessed through the `m`-variable.
m.forEach((match, groupIndex) => {
console.log(`Found match, group ${groupIndex}: ${match}`);
});
}
Python 测试
# coding=utf8
# the above tag defines encoding for this document and is for Python 2.x compatibility
import re
regex = r"([A-Z])(.*?)\1"
test_str = ("ZAp0ZuZAuX0AZA\n"
"ZApxxZuZAuXxafaAZA\n"
"ZApxaf09xZuZAuX090xafaAZA\n"
"abcZApxaf09xZuZAuX090xafaAZA")
matches = re.finditer(regex, test_str, re.MULTILINE)
for matchNum, match in enumerate(matches, start=1):
print ("Match {matchNum} was found at {start}-{end}: {match}".format(matchNum = matchNum, start = match.start(), end = match.end(), match = match.group()))
for groupNum in range(0, len(match.groups())):
groupNum = groupNum + 1
print ("Group {groupNum} found at {start}-{end}: {group}".format(groupNum = groupNum, start = match.start(groupNum), end = match.end(groupNum), group = match.group(groupNum)))
# Note: for Python 2.7 compatibility, use ur"" to prefix the regex and u"" to prefix the test string and substitution.
关于python - 查找以相同大写字符开头和结尾的子字符串,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/56085044/