我有一个如下所示的数据框:
date = c(rep(as.Date("2022/1/1"),3),
rep(as.Date("2022/1/2"),3),
rep(as.Date("2022/1/3"),3))
var1 = c(rep("cat",3),rep("cat,Mother,bike,pasta",3),rep("dog,bird",3))
a = seq(1,9,1)
b = seq(10,18,1)
c = seq(19,27,1)
df = tibble(date,a,b,c,var1);df
# A tibble: 9 × 5
date a b c var1
<date> <dbl> <dbl> <dbl> <chr>
1 2022-01-01 1 10 19 cat
2 2022-01-01 2 11 20 cat
3 2022-01-01 3 12 21 cat
4 2022-01-02 4 13 22 cat,Mother,bike,pasta
5 2022-01-02 5 14 23 cat,Mother,bike,pasta
6 2022-01-02 6 15 24 cat,Mother,bike,pasta
7 2022-01-03 7 16 25 dog,bird
8 2022-01-03 8 17 26 dog,bird
9 2022-01-03 9 18 27 dog,bird
我使用separate_rows和pivot更宽将字符向量分成列:
df%>%separate_rows(var1)%>%
group_by(date) %>%
mutate(id = row_number())%>%
pivot_wider(names_from = id, values_from = var1)
导致:
# A tibble: 9 × 16
# Groups: date [3]
date a b c `1` `2` `3` `4` `5` `6` `7` `8` `9` `10` `11` `12`
<date> <dbl> <dbl> <dbl> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr>
1 2022-01-01 1 10 19 cat NA NA NA NA NA NA NA NA NA NA NA
2 2022-01-01 2 11 20 NA cat NA NA NA NA NA NA NA NA NA NA
3 2022-01-01 3 12 21 NA NA cat NA NA NA NA NA NA NA NA NA
4 2022-01-02 4 13 22 cat Mother bike pasta NA NA NA NA NA NA NA NA
5 2022-01-02 5 14 23 NA NA NA NA cat Mother bike pasta NA NA NA NA
6 2022-01-02 6 15 24 NA NA NA NA NA NA NA NA cat Mother bike pasta
7 2022-01-03 7 16 25 dog bird NA NA NA NA NA NA NA NA NA NA
8 2022-01-03 8 17 26 NA NA dog bird NA NA NA NA NA NA NA NA
9 2022-01-03 9 18 27 NA NA NA NA dog bird NA NA NA NA NA NA
但我想要的数据框必须是这样的:
1
2
3
4
我怎样才能在 R 中做到这一点? 有什么帮助吗?
最佳答案
我们可以拆分“var1”并使用unnest_wider
library(dplyr)
library(tidyr)
df %>%
mutate(var1 = strsplit(var1, ",")) %>%
unnest_wider(var1, names_sep = "")
-输出
# A tibble: 9 × 8
date a b c var11 var12 var13 var14
<date> <dbl> <dbl> <dbl> <chr> <chr> <chr> <chr>
1 2022-01-01 1 10 19 cat <NA> <NA> <NA>
2 2022-01-01 2 11 20 cat <NA> <NA> <NA>
3 2022-01-01 3 12 21 cat <NA> <NA> <NA>
4 2022-01-02 4 13 22 cat Mother bike pasta
5 2022-01-02 5 14 23 cat Mother bike pasta
6 2022-01-02 6 15 24 cat Mother bike pasta
7 2022-01-03 7 16 25 dog bird <NA> <NA>
8 2022-01-03 8 17 26 dog bird <NA> <NA>
9 2022-01-03 9 18 27 dog bird <NA> <NA>
或与单独
library(tidyr)
library(stringr)
colnms <- as.character(seq_len(max(str_count(df$var1, ",")+1)))
separate(df, var1, into = colnms, sep = ",", fill = "right")
-输出
# A tibble: 9 × 8
date a b c `1` `2` `3` `4`
<date> <dbl> <dbl> <dbl> <chr> <chr> <chr> <chr>
1 2022-01-01 1 10 19 cat <NA> <NA> <NA>
2 2022-01-01 2 11 20 cat <NA> <NA> <NA>
3 2022-01-01 3 12 21 cat <NA> <NA> <NA>
4 2022-01-02 4 13 22 cat Mother bike pasta
5 2022-01-02 5 14 23 cat Mother bike pasta
6 2022-01-02 6 15 24 cat Mother bike pasta
7 2022-01-03 7 16 25 dog bird <NA> <NA>
8 2022-01-03 8 17 26 dog bird <NA> <NA>
9 2022-01-03 9 18 27 dog bird <NA> <NA>
关于r - 如何从大型数据框中删除 NA,但保留分隔列的结构?,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/74239279/