我有一个包含 500 个观测值的数据框,但我在示例中只显示了 3 个。这些是不同列中具有不同值的重复项(ID 列除外,其中包括重复的人)。我正在复制数据框的样子 (df) 以及处理后的样子 (df_new)。这可能吗 ?数据框是 10 个变量,所以我不担心将它们“加倍”。变量中的值为 a、b、c、d、0、''。然而,我在表格中将它们保持得更笼统。
df <- data.frame(ID = c('1','1','2', '2', '3','3'),
Year = c('smaller year.1', 'bigger year.1', 'bigger year.2', 'smaller year.2', 'same year.3', 'same year.3'),
V1 = c('a', 'b','c','d','e','f'),
V2 = c('g', 'h', 'i', 'j', 'k', 'l'),
Vn = c('n1', 'n2','n3','n4','n5','n6'))
df_new <- data.frame(ID = c('1','2','3'),
Year_smaller = c('smaller year.1', 'smaller year.2', 'same year.3'),
Year_bigger = c('bigger year.1', 'bigger year.2', 'same year.3'),
V1 = c('a','c','e'),
V1.1 = c('b','d','f'),
V2 = c('g','i','k'),
V2.1 = c('h','j','l'),
Vn = c('n1','n3','n5'),
Vn.1 = c('n2','n4','n6'))
用于编辑数据并根据修订后的要求。由于在字母表中 b
出现在 s
之前,因此 bigger_year
显示在 smaller_year
之前,但是,在实际数据中你会已正确排序年份。不过,如果您想像这样对字符串进行排序,请使用 sort(desc(Year))
而不是 sort(Year)
df <- data.frame(ID = c('1','1','2', '2', '3','3'),
Year = c('smaller year.1', 'bigger year.1', 'bigger year.2', 'smaller year.2', 'same year.3', 'same year.3'),
V1 = c('a', 'b','c','d','e','f'),
V2 = c('g', 'h', 'i', 'j', 'k', 'l'),
Vn = c('n1', 'n2','n3','n4','n5','n6'))
library(tidyverse)
df %>% group_by(ID) %>% mutate(Year = sort(Year)) %>%
mutate(rid = row_number()) %>%
pivot_wider(id_cols = ID, names_from = rid, values_from = c(Year:Vn), names_sep = '')
#> # A tibble: 3 x 9
#> # Groups: ID [3]
#> ID Year1 Year2 V11 V12 V21 V22 Vn1 Vn2
#> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr>
#> 1 1 bigger year.1 smaller year.1 a b g h n1 n2
#> 2 2 bigger year.2 smaller year.2 c d i j n3 n4
#> 3 3 same year.3 same year.3 e f k l n5 n6
由 reprex package 创建于 2021-06-19 (v2.0.0)
library(tidyverse)
df %>% group_by(ID) %>% mutate(rid = row_number()) %>%
pivot_wider(id_cols = ID, names_from = rid, values_from = c(Year:Variable_n), names_sep = '')
# A tibble: 3 x 9
# Groups: ID [3]
ID Year1 Year2 Variable_a1 Variable_a2 Variable_b1 Variable_b2 Variable_n1 Variable_n2
<chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr>
1 1 smaller year.1 bigger year.1 va11 va12 vb11 vb12 vn11 vn12
2 2 bigger year.2 smaller year.2 va21 va22 vb21 vb22 vn21 vn22
3 3 same year.3 same year.3 va31 va32 vb31 vb32 vn31 vn32
你是这个意思吗?
df %>% group_by(ID) %>% arrange(desc(Year)) %>% mutate(rid = row_number()) %>%
pivot_wider(id_cols = ID, names_from = rid, values_from = c(Year:Variable_n), names_sep = '')
# A tibble: 3 x 9
# Groups: ID [3]
ID Year1 Year2 Variable_a1 Variable_a2 Variable_b1 Variable_b2 Variable_n1 Variable_n2
<chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr>
1 2 smaller year.2 bigger year.2 va22 va21 vb22 vb21 vn22 vn21
2 1 smaller year.1 bigger year.1 va11 va12 vb11 vb12 vn11 vn12
3 3 same year.3 same year.3 va31 va32 vb31 vb32 vn31 vn32