我有一个数据框,在示例中我有一个 id 变量 fruit
。对于其他一些变量,每个id变量taste
,ranking
只有一个对应的值,而对于其他变量,有多个color
, 起源
.
我想折叠数据框,以便每个 id 变量都有一行。对于具有多个值的变量,我最好将它们存储为一个列表。
但是,我想不出办法来做到这一点。这是我尝试对具有多个值的变量使用 summarise
和 unique
的方法。但是,我刚刚取回了原始数据:
df %>%
group_by(fruit) %>%
summarise(ranking = mean(ranking),
taste = first(taste),
origin = unique(origin),
color = unique(color))
示例数据:
ranking <- c(1, 1, 2, 2, 3)
fruit <- c("apple", "apple", "pear", "pear", "banana")
color <- c("red", "green", "red", "green", "yellow")
taste <- c("good", "good", "good", "good", "okay")
origin <- c("WA", "CA", "OR", "MX", "PR")
df <- data.frame(fruit, ranking, color, taste, origin)
所需数据”
ranking <- c(1, 2, 3)
fruit <- c("apple", "pear", "banana")
color <- list((c("red", "green")), (c("red", "green")), (c("yellow")))
taste <- c("good", "good", "okay")
origin <- list(c("WA","CA"), c("OR", "MX"), c("PR"))
desired_df <- data.frame(fruit, ranking, taste, I(color), I(origin))
有没有简单的方法来完成这个转换?
最佳答案
我们可以使用 summarise
和 across
,将 unique
元素分组后存储在 list
中感兴趣的列
library(dplyr)
out <- df %>%
group_by(fruit, ranking, taste) %>%
summarise(across(c(color, origin), ~ list(unique(.))), .groups = 'drop')
-输出
out
# A tibble: 3 x 5
fruit ranking taste color origin
<chr> <dbl> <chr> <list> <list>
1 apple 1 good <chr [2]> <chr [2]>
2 banana 3 okay <chr [1]> <chr [1]>
3 pear 2 good <chr [2]> <chr [2]>
如果“味道”应该是 first
元素,而 ranking
应该是 mean
out <- df %>%
group_by(fruit) %>%
summarise(ranking = mean(ranking),
taste = first(taste),
across(c(color, origin), ~ list(unique(.))), .groups = 'drop')
或者使用base R
aggregate(. ~ fruit + ranking + taste, unique(df), FUN = c)
-输出
fruit ranking taste color origin
1 apple 1 good red, green WA, CA
2 pear 2 good red, green OR, MX
3 banana 3 okay yellow PR
关于r - 基于id折叠数据框,将变量的多个值存储到列表中,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/67853174/