假设我有这个字符串:
string <- "I2-1-EX-1-I3-1-EX-1-I2-1-I1-1-EX-1-I3-1-I2-1-EX-1-I2-1-I2-1-I1-1-I3-1-N2-1-I1-1-I1-1-I2-1-N2-1-N3-1-I1-1-NR-1-FA-1-NR-1-I3-1-I1-1-NR-1-N1-1-EX-1-QU-1-I3-1-NR-1-FA-1-EX-1-QU-1-NR-1-I2-1-I2-1-I2-1-NR-1-TR-1-I1-1-I2-1-I3-1-NR-1-I1-1-I1-1-EX-1-NR-1-NR-1-I1-1-NR-1-NR-1-I3-1-I2-1-NR-1-I1-1-QU-1-QU-1-I1-1-TR-1-QU-1-NR-1-NR-1-QU-1-TR-1-NR-1-I1-1-TR-1-I1-1-FA-1-I1-1-I2-1-QU-1-TR-1-FA-1-EX-1-QU-1-QU-1-QU-1-NR-1-QU-1-I1-1-TR-1-FA-1-QU-1-FA-1-FA-1-TR-1-FA-1-QU-1-EX-1-QU-1-I1-1-QU-1-QU-1-FA-1-FA-1-QU-1-QU-1-FA-1-FA-1-I3-1-NR-1-FA-1-I1-1-I2-1-FA-1-QU-1-FA-1-I2-1-FA-1-NR-1-I1-1-NR-1-TR-1-NR-1-EX-1-NR-1-NR-1-EX-1-TR-1-I3-1-I1-1-NR-1-NR-1-FA-1-I1-1-TR-1-EX-1-NR-1-NR-1-I1-1-I1-1-NR-1-I1-1-NR-1-EX-1-EX-1-EX-1-NR-1-NR-1-NR-1-FA-1-FA"
我想匹配两个包含 "I"
的标记之间出现的所有内容.例如,这意味着匹配,从字符串的开头:
-EX-
-EX-
-EX-
-EX-
-N2-
-N2-1-N3-
-NR-1-FA-1-NR-
etc...
我如何使用正则表达式(非常适合 R)实现这种匹配?
我尝试了类似 (?=<1|2|3).*(?=I)
的方法,但它似乎不起作用。我对上面的正则表达式的基本原理是,所有 I 都以 1、2 或 3 结尾,这将是后视应该找到的左手边界,而 I 是前瞻应该找到的右手边界。
最佳答案
似乎您正在尝试获取 I[123]-1
之间的所有字符和 1-I[123]
. \K
keeps the text matched so far out of the overall regex match . (?:(?!I[123]).)*?
只有当它不是起始 I
时才会匹配任何单个字符在I[123]
, 否则匹配失败。
> x <- "I2-1-EX-1-I3-1-EX-1-I2-1-I1-1-EX-1-I3-1-I2-1-EX-1-I2-1-I2-1-I1-1-I3-1-N2-1-I1-1-I1-1-I2-1-N2-1-N3-1-I1-1-NR-1-FA-1-NR-1-I3-1-I1-1-NR-1-N1-1-EX-1-QU-1-I3-1-NR-1-FA-1-EX-1-QU-1-NR-1-I2-1-I2-1-I2-1-NR-1-TR-1-I1-1-I2-1-I3-1-NR-1-I1-1-I1-1-EX-1-NR-1-NR-1-I1-1-NR-1-NR-1-I3-1-I2-1-NR-1-I1-1-QU-1-QU-1-I1-1-TR-1-QU-1-NR-1-NR-1-QU-1-TR-1-NR-1-I1-1-TR-1-I1-1-FA-1-I1-1-I2-1-QU-1-TR-1-FA-1-EX-1-QU-1-QU-1-QU-1-NR-1-QU-1-I1-1-TR-1-FA-1-QU-1-FA-1-FA-1-TR-1-FA-1-QU-1-EX-1-QU-1-I1-1-QU-1-QU-1-FA-1-FA-1-QU-1-QU-1-FA-1-FA-1-I3-1-NR-1-FA-1-I1-1-I2-1-FA-1-QU-1-FA-1-I2-1-FA-1-NR-1-I1-1-NR-1-TR-1-NR-1-EX-1-NR-1-NR-1-EX-1-TR-1-I3-1-I1-1-NR-1-NR-1-FA-1-I1-1-TR-1-EX-1-NR-1-NR-1-I1-1-I1-1-NR-1-I1-1-NR-1-EX-1-EX-1-EX-1-NR-1-NR-1-NR-1-FA-1-FA"
> regmatches(x, gregexpr("I[123]-1\\K-(?:(?!I[123]).)*?-(?=1-I[123])", x , perl=TRUE))
[[1]]
[1] "-EX-"
[2] "-EX-"
[3] "-EX-"
[4] "-EX-"
[5] "-N2-"
[6] "-N2-1-N3-"
[7] "-NR-1-FA-1-NR-"
[8] "-NR-1-N1-1-EX-1-QU-"
[9] "-NR-1-FA-1-EX-1-QU-1-NR-"
[10] "-NR-1-TR-"
[11] "-NR-"
[12] "-EX-1-NR-1-NR-"
[13] "-NR-1-NR-"
[14] "-NR-"
[15] "-QU-1-QU-"
[16] "-TR-1-QU-1-NR-1-NR-1-QU-1-TR-1-NR-"
[17] "-TR-"
[18] "-FA-"
[19] "-QU-1-TR-1-FA-1-EX-1-QU-1-QU-1-QU-1-NR-1-QU-"
[20] "-TR-1-FA-1-QU-1-FA-1-FA-1-TR-1-FA-1-QU-1-EX-1-QU-"
[21] "-QU-1-QU-1-FA-1-FA-1-QU-1-QU-1-FA-1-FA-"
[22] "-NR-1-FA-"
[23] "-FA-1-QU-1-FA-"
[24] "-FA-1-NR-"
[25] "-NR-1-TR-1-NR-1-EX-1-NR-1-NR-1-EX-1-TR-"
[26] "-NR-1-NR-1-FA-"
[27] "-TR-1-EX-1-NR-1-NR-"
[28] "-NR-"
关于regex - 匹配两个字符串之间的所有内容,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/29025619/