r - 使用 R 中的正则表达式将所有匹配项提取到新列

在我的数据中，我有一列类似于以下示例的打开文本字段数据:

d <- tribble(
  ~x,
  "i am 10 and she is 50",
  "he is 32 and i am 22",
  "he may be 70 and she may be 99",
)

我想使用 regex 将所有两位数字提取到一个名为 y 的新列中。我有以下代码，它可以很好地提取第一个匹配项:

d %>%
  mutate(y = str_extract(x, "([0-9]{2})"))

# A tibble: 3 x 2
  x                              y    
  <chr>                          <chr>
1 i am 10 and she is 50          10   
2 he is 32 and i am 22           32   
3 he may be 70 and she may be 99 70

但是，有没有办法使用一些标准分隔符(例如逗号)将两个数字提取到同一列？

最佳答案

我们还可以使用 tidyr 中的 extract 和 unite:

library(dplyr)
library(tidyr)

d %>%
  extract(x, c('y', 'z'), regex = "(\\d+)[^\\d]+(\\d+)", remove = FALSE)

输出:

# A tibble: 3 x 3
  x                              y     z    
  <chr>                          <chr> <chr>
1 i am 10 and she is 50          10    50   
2 he is 32 and i am 22           32    22   
3 he may be 70 and she may be 99 70    99

返回单列:

d %>%
  extract(x, c('y', 'z'), regex = "(\\d+)[^\\d]+(\\d+)", remove = FALSE) %>%
  unite('y', y, z, sep = ', ')

输出:

# A tibble: 3 x 3
  x                              y     
  <chr>                          <chr> 
1 i am 10 and she is 50          10, 50
2 he is 32 and i am 22           32, 22
3 he may be 70 and she may be 99 70, 99

关于r - 使用 R 中的正则表达式将所有匹配项提取到新列，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/59671435/

上一篇：.net - .NET 有没有好的 yacc/bison 类型的 LALR 解析器生成器？

下一篇：ruby-on-rails - 有没有办法将命名范围组合成一个新的命名范围？

r - 如何使用 data.table 高效地创建新变量并分配列名？

r - 用户定义的 S3 组通用函数如何在 R 中工作？

c# - 用户名的正则表达式？

Javascript 对捕获的正则表达式执行算术运算

r - 缓慢的 data.table 子集与 dplyr

javascript - 如何在不丢失文本格式的情况下替换内容可编辑 div 中部分文本的文本格式

regex - stringr 包中的 Perl 正则表达式

r - 修剪 R 中最后的特殊字符

r - 使用 stringr 查找另一个单词附近的单词