我正在尝试使用 regex
将 % 符号添加到我的数据框中 0 到 10 范围内的数字。数据框如下所示:
structure(list(comment = c("3.22%-1ST $100000 AND 1.15% BALANCE",
"3.25% 1ST $100000 AND 1.16% BALANCE", "3.22% 1ST 100000 AND 1.16 BALANCE",
"3.22% 1ST 100000 AND 1.15% BALANCE", "3.26-100 AND 1.16", "3.26-100 AND 1.16"
)), row.names = c(NA, -6L), class = c("tbl_df", "tbl", "data.frame"
))
1 3.22%-1ST $100000 AND 1.15% BALANCE
2 3.25% 1ST $100000 AND 1.16% BALANCE
3 3.22% 1ST 100000 AND 1.16 BALANCE
4 3.22% 1ST 100000 AND 1.15% BALANCE
5 3.26-100 AND 1.16
6 3.26-100 AND 1.16
所以基本上,我只想将 % 添加到第 3 行的 1.16 和第 5 行和第 6 行的 3.26 和 1.16。我编写了如下所示的代码:
tt$modified <- gsub("([0-9]\\.[0-9][0-9])", "\\1%", tt$comment)
但如下所示,这会将 % 添加到所有数字:
comment modified
<chr> <chr>
1 3.22%-1ST $100000 AND 1.15% BALANCE 3.22%%-1ST $100000 AND 1.15%% BALANCE
2 3.25% 1ST $100000 AND 1.16% BALANCE 3.25%% 1ST $100000 AND 1.16%% BALANCE
3 3.22% 1ST 100000 AND 1.16 BALANCE 3.22%% 1ST 100000 AND 1.16% BALANCE
4 3.22% 1ST 100000 AND 1.15% BALANCE 3.22%% 1ST 100000 AND 1.15%% BALANCE
5 3.26-100 AND 1.16 3.26%-100 AND 1.16%
6 3.26-100 AND 1.16 3.26%-100 AND 1.16%
我该如何解决这个问题?
最佳答案
你可以在这里明智地使用 lookarounds 来确保百分号只添加到你想要的地方:
df$comment <- gsub("\\b(\\d+\\.\\d+)\\b(?![%.])", "\\1%", df$comment, perl=TRUE)
df
comment
1 3.22%-1ST $100000 AND 1.15% BALANCE
2 3.25% 1ST $100000 AND 1.16% BALANCE
3 3.22% 1ST 100000 AND 1.16% BALANCE
4 3.22% 1ST 100000 AND 1.15% BALANCE
5 3.26%-100 AND 1.16%
6 3.26%-100 AND 1.16%
请注意,我在这里假设您仅想要定位十进制数。如果您还可能希望以整数为目标,那么我们将需要有关所有替换上下文的更多信息。
正则表达式模式表示:
\b match a word boundary (start of the number)
( capture
\d+\.\d+ a number with a decimal component
) end capture
\b word boundary
(?![%.]) assert that what follows is NOT % or .
请注意,最终的否定先行可防止对已经具有 %
或十进制数的整数部分的数字进行替换。
关于r - 如何使用 `regex` 仅针对没有它的特定字符串将 % 符号添加到字符串,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/64708460/