我的数据框具有多个级别的因素race
和group
,下面是最小的示例:
id race group
1 1 White 1
2 2 White 1
3 3 White 1
4 4 White 1
5 5 White 1
6 6 White 2
7 7 White 2
8 8 White 2
9 9 White 2
10 10 Black 1
11 11 Black 1
12 12 Black 1
13 13 Black 2
14 14 Black 2
15 15 Black 2
16 16 Black 2
17 17 Hispanic 1
18 18 Hispanic 1
19 19 Hispanic 1
20 20 Hispanic 1
21 21 Hispanic 1
22 22 Hispanic 2
23 23 Hispanic 2
24 24 Hispanic 2
25 25 Hispanic 2
我可以使用“White”
对每个race
级别进行分组的单个数据帧,然后使用以下函数按group
拆分数据。
filter.race <- function(x, y) { f <- subset(x, race == "White" | race == y)
f <- split(f, f$group)
f}
返回结果:
filter.race(df, "Black")
$`1`
id race group
1 1 White 1
2 2 White 1
3 3 White 1
4 4 White 1
5 5 White 1
10 10 Black 1
11 11 Black 1
12 12 Black 1
$`2`
id race group
6 6 White 2
7 7 White 2
8 8 White 2
9 9 White 2
13 13 Black 2
14 14 Black 2
15 15 Black 2
16 16 Black 2
filter.race(df, "Hispanic")
$`1`
id race group
1 1 White 1
2 2 White 1
3 3 White 1
4 4 White 1
5 5 White 1
17 17 Hispanic 1
18 18 Hispanic 1
19 19 Hispanic 1
20 20 Hispanic 1
21 21 Hispanic 1
$`2`
id race group
6 6 White 2
7 7 White 2
8 8 White 2
9 9 White 2
22 22 Hispanic 2
23 23 Hispanic 2
24 24 Hispanic 2
25 25 Hispanic 2
但是,我正在尝试找到一种方法来将此函数应用于数据帧的所有级别,而不是多次单独指定 y
。
示例数据:
dput(df)
structure(list(id = 1:25, race = structure(c(3L, 3L, 3L, 3L,
3L, 3L, 3L, 3L, 3L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L,
2L, 2L, 2L, 2L, 2L), .Label = c("Black", "Hispanic", "White"), class = "factor"),
group = c(1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 1L, 1L, 1L,
2L, 2L, 2L, 2L, 1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L)), .Names = c("id",
"race", "group"), class = "data.frame", row.names = c(NA, -25L
))
最佳答案
考虑by
(面向对象的包装器tapply
)最初和在race和group中划分子集每次迭代rbind
每个相应组的White。对于White组织本身,unique
可以消除重复数据。
df_list <- by(df, df[c("race", "group")], function(sub) {
unique(
rbind(subset(df, race == "White" & group == sub$group[1]),
sub)
)
})
# race: Black
# group: 1
# id race group
# 1 1 White 1
# 2 2 White 1
# 3 3 White 1
# 4 4 White 1
# 5 5 White 1
# 10 10 Black 1
# 11 11 Black 1
# 12 12 Black 1
# ------------------------------------------------------------
# race: Hispanic
# group: 1
# id race group
# 1 1 White 1
# 2 2 White 1
# 3 3 White 1
# 4 4 White 1
# 5 5 White 1
# 17 17 Hispanic 1
# 18 18 Hispanic 1
# 19 19 Hispanic 1
# 20 20 Hispanic 1
# 21 21 Hispanic 1
# ------------------------------------------------------------
# race: White
# group: 1
# id race group
# 1 1 White 1
# 2 2 White 1
# 3 3 White 1
# 4 4 White 1
# 5 5 White 1
# ------------------------------------------------------------
# race: Black
# group: 2
# id race group
# 6 6 White 2
# 7 7 White 2
# 8 8 White 2
# 9 9 White 2
# 13 13 Black 2
# 14 14 Black 2
# 15 15 Black 2
# 16 16 Black 2
# ------------------------------------------------------------
# race: Hispanic
# group: 2
# id race group
# 6 6 White 2
# 7 7 White 2
# 8 8 White 2
# 9 9 White 2
# 22 22 Hispanic 2
# 23 23 Hispanic 2
# 24 24 Hispanic 2
# 25 25 Hispanic 2
# ------------------------------------------------------------
# race: White
# group: 2
# id race group
# 6 6 White 2
# 7 7 White 2
# 8 8 White 2
# 9 9 White 2
关于r - 将函数应用于数据框列表中的每个因子级别,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/56996449/