r - 分离(或类似的功能)具有多个或没有出现分割字符

我有一个这样的小玩意

library("tidyverse")
tib <- tibble(x = c("lemon", "yellow, banana", "red, big, apple"))

我想创建两个名为 description 的新列和 fruit并使用 separate 提取逗号后的最后一个单词(如果有逗号；否则，我只想复制单元格中的单词)。

到目前为止，我有

tib %>%
    separate(x, ", ", into = c("description", "fruit"), remove = FALSE)

但这并不完全符合我的要求，产生:

# A tibble: 3 x 3
  x               description fruit 
  <chr>           <chr>       <chr> 
1 lemon           lemon       NA    
2 yellow, banana  yellow      banana
3 red, big, apple red         big   
Warning messages:
1: Expected 2 pieces. Additional pieces discarded in 1 rows [3]. 
2: Expected 2 pieces. Missing pieces filled with `NA` in 1 rows [1].

我想要的输出是:

  x               description fruit 
1 lemon           NA          lemon    
2 yellow, banana  yellow      banana
3 red, big, apple red, big    apple

有人可以指出我缺少的部分吗？

编辑

不必使用 separate 来实现目标。 . mutate也可以，解决方案同样值得赞赏!

最佳答案

用 extract 可能会更好.在这里，我们可以使用捕获组将字符作为一个组进行捕获。最好从末尾开始 ( $ ) 并倒退，即捕获末尾的单词 ( \\w+ ) 成功 ,或空格 ( \\s ) 和第一个捕获组 ( (.*?) ) 中的所有其他字符

library(tidyr)
library(dplyr)
tib %>%
   extract(x, into = c("description", "fruit"), remove = FALSE, '(.*?),?\\s?(\\w+$)')

或者使用正则表达式环视 separate , 通过将分隔符指定为 ,后跟空格或字符串的开头 ( ^ ) 后跟字符串结尾 ( \\w+ ) 的单词 ( $ )

tib %>%
   separate(x, into = c("description", 'fruit'),
       remove = FALSE, '(, |^)(?=\\w+$)') %>%
   mutate(description = na_if(description, ""))

此外，还有另一个选项 separate将是在最后一个单词之前插入一个新的分隔符，然后将其用作 sep

library(stringr)
tib %>% 
  mutate(x1 = str_replace(x, ',? ?(\\w+)$', ";\\1")) %>% 
  separate(x1, into = c("description", "fruit"), sep=";") %>%
  mutate(description = na_if(description, ""))
# A tibble: 3 x 3
#  x               description fruit 
#  <chr>           <chr>       <chr> 
#1 lemon           <NA>        lemon 
#2 yellow, banana  yellow      banana
#3 red, big, apple red, big    apple

关于r - 分离(或类似的功能)具有多个或没有出现分割字符，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/58309187/

r - 分离(或类似的功能)具有多个或没有出现分割字符

上一篇：r - 使用 R 预测多个时间序列

下一篇：android - 如何在 android 中使用房间数据库更改 LiveData 可观察查询参数？