r - 量词可以用于 R 中的正则表达式替换吗？

我的目标是用一个符号替换一个字符串，该符号重复的字符数与该字符串的字符数一样多，以一种可以用 \\U\\1 将字母替换为大写字母的方式，如果我的模式是 "...(*)..."我替换了 (*) 捕获的内容会像 x\\q1或 {\\q1}x所以我会得到这么多 x作为 * 捕获的字符.

这可能吗？

我主要在想sub,gsub但你可以用其他图书馆员回答，比如 stringi,stringr ， ETC。您可以使用 perl = TRUE或 perl = FALSE以及任何其他方便的选项。

我认为答案可能是否定的，因为选项似乎非常有限 (?gsub):

a replacement for matched pattern in sub and gsub. Coerced to character if possible. For fixed = FALSE this can include backreferences "\1" to "\9" to parenthesized subexpressions of pattern. For perl = TRUE only, it can also contain "\U" or "\L" to convert the rest of the replacement to upper or lower case and "\E" to end case conversion. If a character vector of length 2 or more is supplied, the first element is used with a warning. If NA, all elements in the result corresponding to matches will be set to NA.

主要量词是 ( ?base::regex ):

?

    The preceding item is optional and will be matched at most once.
*

    The preceding item will be matched zero or more times.
+

    The preceding item will be matched one or more times.
{n}

    The preceding item is matched exactly n times.
{n,}

    The preceding item is matched n or more times.
{n,m}

    The preceding item is matched at least n times, but not more than m times.

好的，但这似乎是一个选项(不在 PCRE 中，不确定是否在 PERL 中或哪里...) (*)它捕获星号量词能够匹配的字符数(我在 https://www.rexegg.com/regex-quantifier-capture.html 找到它)所以它可以被使用 \q1 (相同的引用)引用第一个捕获的量词(和 \q2 等)。我还读到 (*)相当于{0,}但我不确定这是否真的是我感兴趣的事实。

编辑更新:

由于评论者的提问，我用 this interesting question 提供的具体示例更新了我的问题.我修改了一下这个例子。假设我们有 a <- "I hate extra spaces elephant"所以我们有兴趣在单词之间保持一个独特的空间，每个单词的前 5 个字符(直到这里作为原始问题)然后是每个其他字符的点(不确定这是否是原始问题中预期的但没关系)所以结果字符串将是 "I hate extra space. eleph..." (一个 . 表示 s 中的最后一个 spaces，3 个点表示 ant 末尾的 3 个字母 elephant)。所以我首先将前 5 个字符保留为

gsub("(?<!\\S)(\\S{5})\\S*", "\\1", a, perl = TRUE)
[1] "I hate extra space eleph"

我应该如何替换 \\S* 中的确切字符数？通过点或任何其他符号？

最佳答案

替换模式中不能使用量词，也不能显示它们匹配的字符数。

您需要的是 \G base PCRE pattern在字符串中的特定位置之后查找连续匹配项:

a <- "I hate extra spaces elephant"
gsub("(?:\\G(?!^)|(?<!\\S)\\S{5})\\K\\S", ".", a, perl = TRUE)

参见 R demo和 regex demo .

详情

(?:\G(?!^)|(?<!\S)\S{5}) - 上一个成功匹配的结尾或前面没有非空白字符的五个非空白字符
\K - 一个 match reset operator丢弃到目前为止匹配的文本
\S - 任何非空白字符。

关于r - 量词可以用于 R 中的正则表达式替换吗？，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/64594677/

r - 量词可以用于 R 中的正则表达式替换吗？

上一篇：reactjs - 让 VSCode 自动完成 React Prop 类型和名称？

下一篇：assembly - 在 x86-SSE 中将四个压缩单精度 float 转换为无符号双字