在我的数据中,我有一列类似于以下示例的打开文本字段数据:
d <- tribble(
~x,
"i am 10 and she is 50",
"he is 32 and i am 22",
"he may be 70 and she may be 99",
)
我想使用 regex
将所有两位数字提取到一个名为 y
的新列中。我有以下代码,它可以很好地提取第一个匹配项:
d %>%
mutate(y = str_extract(x, "([0-9]{2})"))
# A tibble: 3 x 2
x y
<chr> <chr>
1 i am 10 and she is 50 10
2 he is 32 and i am 22 32
3 he may be 70 and she may be 99 70
但是,有没有办法使用一些标准分隔符(例如逗号)将两个数字提取到同一列?
最佳答案
我们还可以使用 tidyr
中的 extract
和 unite
:
library(dplyr)
library(tidyr)
d %>%
extract(x, c('y', 'z'), regex = "(\\d+)[^\\d]+(\\d+)", remove = FALSE)
输出:
# A tibble: 3 x 3
x y z
<chr> <chr> <chr>
1 i am 10 and she is 50 10 50
2 he is 32 and i am 22 32 22
3 he may be 70 and she may be 99 70 99
返回单列:
d %>%
extract(x, c('y', 'z'), regex = "(\\d+)[^\\d]+(\\d+)", remove = FALSE) %>%
unite('y', y, z, sep = ', ')
输出:
# A tibble: 3 x 3
x y
<chr> <chr>
1 i am 10 and she is 50 10, 50
2 he is 32 and i am 22 32, 22
3 he may be 70 and she may be 99 70, 99
关于r - 使用 R 中的正则表达式将所有匹配项提取到新列,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/59671435/