r - 如何使用 `regex` 仅针对没有它的特定字符串将 % 符号添加到字符串

我正在尝试使用 regex 将 % 符号添加到我的数据框中 0 到 10 范围内的数字。数据框如下所示:

structure(list(comment = c("3.22%-1ST $100000 AND 1.15% BALANCE", 
"3.25%  1ST $100000 AND 1.16%  BALANCE", "3.22% 1ST 100000 AND 1.16  BALANCE", 
"3.22% 1ST 100000 AND 1.15%  BALANCE", "3.26-100 AND 1.16", "3.26-100 AND 1.16"
)), row.names = c(NA, -6L), class = c("tbl_df", "tbl", "data.frame"
))

1 3.22%-1ST $100000 AND 1.15% BALANCE  
2 3.25%  1ST $100000 AND 1.16%  BALANCE
3 3.22% 1ST 100000 AND 1.16  BALANCE   
4 3.22% 1ST 100000 AND 1.15%  BALANCE  
5 3.26-100 AND 1.16                    
6 3.26-100 AND 1.16

所以基本上，我只想将 % 添加到第 3 行的 1.16 和第 5 行和第 6 行的 3.26 和 1.16。我编写了如下所示的代码:

tt$modified <- gsub("([0-9]\\.[0-9][0-9])", "\\1%", tt$comment)

但如下所示，这会将 % 添加到所有数字:

  comment                               modified                               
  <chr>                                 <chr>                                  
1 3.22%-1ST $100000 AND 1.15% BALANCE   3.22%%-1ST $100000 AND 1.15%% BALANCE  
2 3.25%  1ST $100000 AND 1.16%  BALANCE 3.25%%  1ST $100000 AND 1.16%%  BALANCE
3 3.22% 1ST 100000 AND 1.16  BALANCE    3.22%% 1ST 100000 AND 1.16%  BALANCE   
4 3.22% 1ST 100000 AND 1.15%  BALANCE   3.22%% 1ST 100000 AND 1.15%%  BALANCE  
5 3.26-100 AND 1.16                     3.26%-100 AND 1.16%                    
6 3.26-100 AND 1.16                     3.26%-100 AND 1.16%

我该如何解决这个问题？

最佳答案

你可以在这里明智地使用 lookarounds 来确保百分号只添加到你想要的地方:

df$comment <- gsub("\\b(\\d+\\.\\d+)\\b(?![%.])", "\\1%", df$comment, perl=TRUE)
df

                                comment
1   3.22%-1ST $100000 AND 1.15% BALANCE
2 3.25%  1ST $100000 AND 1.16%  BALANCE
3   3.22% 1ST 100000 AND 1.16%  BALANCE
4   3.22% 1ST 100000 AND 1.15%  BALANCE
5                   3.26%-100 AND 1.16%
6                   3.26%-100 AND 1.16%

请注意，我在这里假设您仅想要定位十进制数。如果您还可能希望以整数为目标，那么我们将需要有关所有替换上下文的更多信息。

正则表达式模式表示:

\b            match a word boundary (start of the number)
(             capture
    \d+\.\d+  a number with a decimal component
)             end capture
\b            word boundary
(?![%.])      assert that what follows is NOT % or .

请注意，最终的否定先行可防止对已经具有 % 或十进制数的整数部分的数字进行替换。

关于r - 如何使用 `regex` 仅针对没有它的特定字符串将 % 符号添加到字符串，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/64708460/

r - 如何使用 `regex` 仅针对没有它的特定字符串将 % 符号添加到字符串

上一篇：python - 为特定领域微调 Bert(无监督)

下一篇：github-actions - 如何在每次推送提交时运行 GitHub 工作流