r - 有没有办法用 R 中的列拆分并估算隐含值

我正在尝试拆分数据集中的一列，该列的代码由“-”分隔。这产生了两个问题。首先，我必须拆分列，但我也想估算“-”隐含的值。我能够使用以下方法拆分数据:

separate_rows(df, code, sep = "-")

但我仍然没有找到一种方法来估算隐含值(value)。

name <- c('group1', 'group1','group1','group2', 'group1', 'group1', 
'group1')
code <- c('93790', '98960 - 98962', '98966 - 98969', '99078', 'S5950', 
'99241 - 99245', '99247')
df <- data.frame( name, code)

我尝试输出的内容看起来像这样:

group1 93790, 98960, 98961, 98962, 98966, 98967, 98968, 98969, S5950, 99241, 
99242, 99243, 99244, 99245, 99247
group2 99078

在此示例中，98961、98967 和 98968 是从“-”推算和暗示的。

关于如何实现这一点有什么想法吗？

最佳答案

在我们拆分'code'之后，一个选项是使用map循环遍历拆分元素，得到一个序列(:)，unnest 并执行 group_by paste

library(dplyr)
library(stringr)
library(tidyr)
library(purrr)
df %>% 
  mutate(code = map(strsplit(as.character(code), " - "), ~  {
      x <- as.numeric(.x)
      if(length(x) > 1)  x[1]:x[2] else x})) %>%
  unnest(code) %>%
  group_by(name) %>%
  summarise(code = str_c(code, collapse=", "))
# A tibble: 2 x 2
#   name   code                                                  
#   <fct>  <chr>                                                  
# 1 group1 93790, 98960, 98961, 98962, 98966, 98967, 98968, 98969
# 2 group2 99078

或者另一个选项是在 separate_rows 之前，创建一个行索引并在我们执行 complete 时使用它进行分组

df %>% 
    mutate(rn = row_number()) %>%
    separate_rows(code, convert = TRUE) %>% 
    group_by(rn, name) %>%
    complete(code = min(code):max(code)) %>%
    group_by(name) %>%
    summarise(code = str_c(code, collapse =", "))

更新

如果有非数字元素

df %>% 
 mutate(rn = row_number()) %>%
 separate_rows(code, convert = TRUE) %>%
 group_by(name, rn) %>% 
 complete(code = if(any(str_detect(code, '\\D'))) code else 
     as.character(min(as.numeric(code)):max(as.numeric(code)))) %>% 
 group_by(name) %>%
 summarise(code = str_c(code, collapse =", "))
# A tibble: 2 x 2
#  name   code                                                                                                   
#  <fct>  <chr>                                                                                                  
#1 group1 93790, 98960, 98961, 98962, 98966, 98967, 98968, 98969, S5950, 99241, 99242, 99243, 99244, 99245, 99247
#2 group2 99078

关于r - 有没有办法用 R 中的列拆分并估算隐含值，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/59271579/

r - 有没有办法用 R 中的列拆分并估算隐含值

更新

上一篇：r - 如何使用相同的函数按行比较矩阵列表

下一篇：raku - 如何推送到数组的哈希键上？