假设我有以下数据。 [根据要求,我正在添加数据]
col1 <- c("Team A", "Team A", "Team A", "Team B", "Team B", "Team B", "Team C", "Team C", "Team C", "Team D", "Team D", "Team D")
col2 <- c("High", "Medium", "Medium", "Low", "Low", "Low", "High", "Medium", "Low", "Medium", "Medium", "Medium")
col3 <- c("Yes", "Yes", "No", "No", "No", "Yes", "No", "Yes", "No", "Yes", "Yes", "Yes")
col4 <- c("No", "Yes", "No", "Yes", "Yes", "No", "No", "Yes", "No", "Yes", "No", "Yes")
df <- data.frame(col1, col2, col3, col4)
# Col1 Col2 Col3 Col4
# Team A High Yes No
# Team A Medium Yes Yes
# Team A Medium No No
# Team B Low No Yes
# Team B Low No Yes
# Team B Low Yes No
# Team C High No No
# Team C Medium Yes Yes
# Team C Low No No
# Team D Medium Yes Yes
# Team D Medium Yes No
# Team D Medium Yes Yes
我想用
dplyr
函数得到以下结果。 Status_1 需要是 Col3 中针对每支球队的"is"的数量,而 Status_2 将是 Col4 中针对每支球队的"is"的数量 High Medium Low Status_1 Status_2
Team A 1 2 0 2 1
Team B 0 0 3 1 2
Team C 1 1 1 1 1
Team D 0 3 0 3 2
我能够生成正常的摘要,但使用以下语句生成“Status_1”和“Status_2”的最后两列。请问有人可以帮忙吗?
df %>%
group_by(Col1, Col2) %>%
summarise(Count = n()) %>%
spread(Col1, Count, fill = 0)
最佳答案
首先,将数据按 col1
分组统计Yes
的数量在 col3
和 col4
.然后再次按所有列分组并使用 n()
计算每组中的观察次数.最后,使用 tidyr::pivot_wider
将数据从长转换为宽。
df %>%
group_by(col1) %>%
mutate_at(vars(col3:col4), ~ sum(. == "Yes")) %>%
rename(status_1 = col3, status_2 = col4) %>%
group_by_all %>%
summarise(n = n()) %>%
tidyr::pivot_wider(names_from = col2, values_from = n, values_fill = list(n = 0))
# # A tibble: 4 x 6
# col1 status_1 status_2 High Medium Low
# <fct> <int> <int> <int> <int> <int>
# 1 Team A 2 1 1 2 0
# 2 Team B 1 2 0 0 3
# 3 Team C 1 1 1 1 1
# 4 Team D 3 2 0 3 0
关于r - 使用 dplyr 在数据帧的多列中计算 "Yes",我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/59116678/