r - 提取部分字符串值，创建新的列名，并使数据框变宽

我想提取字符串列的最后一部分(总是包含在方括号中)，将这些作为新列的名称，然后将数据从长到宽重新整形，并用这些值填充新列。

例如，如果我有这个数据框:

whatihave <- data_frame(v1 = c('abc [effort]', 'def [effort]', 'ghi [effort]', 'abc [scope]', 'def [scope]', 'ghi [scope]'),
                        scores = c(1:6))

# A tibble: 6 x 2
  v1           scores
  <chr>         <int>
1 abc [effort]      1
2 def [effort]      2
3 ghi [effort]      3
4 abc [scope]       4
5 def [scope]       5
6 ghi [scope]       6

我想把它转换成这个数据框:

whatiwant <- data_frame(v1 = c('abc', 'def', 'ghi'), 
                        effort = c(1, 2, 3),
                        scope = c(4, 5, 6))

  v1    effort scope
  <chr>  <dbl> <dbl>
1 abc        1     4
2 def        2     5
3 ghi        3     6

如您所见，v1 列中值末尾方括号内的字符已成为两个新变量的名称(effort 和 范围)。 scores 列中的值随后填充了我创建的新列。

我该怎么做？

在下方编辑

我模拟的玩具数据缺少实际数据的一个关键特征。实际上有多个实例具有相同的 v1 值并且在方括号中具有相同的后缀。

因此，下面的答案(否则非常好)会在 effort 和 scope 的每个单元格中生成列列表，而不是单个值。

让我们假设我有这些数据:

whatiactuallyhave <- data_frame(v1 = c('abc [effort]', 'abc [effort]', 'def [effort]', 'def [effort]', 'ghi [effort]', 'abc [scope]', 'abc [scope]', 'def [scope]', 'ghi [scope]', 'ghi [scope]'), 
                        scores = c('1', '2', '3', '4', '5', '6', '7', '8', '9', '10'))

# A tibble: 10 x 2
   v1           scores
   <chr>        <chr> 
 1 abc [effort] 1     
 2 abc [effort] 2     
 3 def [effort] 3     
 4 def [effort] 4     
 5 ghi [effort] 5     
 6 abc [scope]  6     
 7 abc [scope]  7     
 8 def [scope]  8     
 9 ghi [scope]  9     
10 ghi [scope]  10

我想把它变成这样:

whatiactuallywant <- data_frame(v1 = c('abc', 'abc', 'def', 'def', 'ghi', 'ghi'), 
                        effort = c('1', '2', '3', '4', '5', 'NA'),
                        scope = c('6', '7', '8', 'NA', '9', '10'))

# A tibble: 6 x 3
  v1    effort scope
  <chr> <chr>  <chr>
1 abc   1      6    
2 abc   2      7    
3 def   3      8    
4 def   4      NA   
5 ghi   5      9    
6 ghi   NA     10

我希望现在更清楚了!非常感谢您的帮助。

最佳答案

修改后的场景

使用 tidyr::extract 将让您节省一个额外的 mutate 步骤，因为您可以在此处使用 regex 直接将两个所需的字符串提取到两列中。

library(tidyverse)
whatiactuallyhave <- data_frame(v1 = c('abc [effort]', 'abc [effort]', 'def [effort]', 'def [effort]', 'ghi [effort]', 'abc [scope]', 'abc [scope]', 'def [scope]', 'ghi [scope]', 'ghi [scope]'), 
                                scores = c('1', '2', '3', '4', '5', '6', '7', '8', '9', '10'))
#> Warning: `data_frame()` was deprecated in tibble 1.1.0.
#> Please use `tibble()` instead.

whatiactuallyhave %>%
  tidyr::extract(v1, into = c('v1', 'name'), regex = '(\\w+)\\s\\[(\\w+)\\]') %>%
  group_by(v1, name) %>%
  mutate(d = row_number()) %>%
  pivot_wider(names_from = name, values_from = scores, values_fill = NA) %>%
  select(-d)

#> # A tibble: 6 x 3
#> # Groups:   v1 [3]
#>   v1    effort scope
#>   <chr> <chr>  <chr>
#> 1 abc   1      6    
#> 2 abc   2      7    
#> 3 def   3      8    
#> 4 def   4      <NA> 
#> 5 ghi   5      9    
#> 6 ghi   <NA>   10

^{由 reprex package 创建于 2021-05-26 (v2.0.0)}

较早的回答

whatihave <- data_frame(v1 = c('abc [effort]', 'def [effort]', 'ghi [effort]', 'abc [scope]', 'def [scope]', 'ghi [scope]'),
                        scores = c(1:6))

library(tidyverse)
whatihave %>%
  separate(v1, into = c('v1', 'name'), sep = ' \\[') %>%
  mutate(name = str_remove(name, '\\]')) %>%
  pivot_wider(names_from = name, values_from = scores)


# A tibble: 3 x 3
  v1    effort scope
  <chr>  <int> <int>
1 abc        1     4
2 def        2     5
3 ghi        3     6

关于r - 提取部分字符串值，创建新的列名，并使数据框变宽，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/67707999/

r - 提取部分字符串值，创建新的列名，并使数据框变宽

修改后的场景

较早的回答

上一篇：rust - 我如何在编译时检查切片是否具有特定大小？

下一篇：c - 用逗号分隔的多个值的宏实际上是什么意思？