我希望按行业的女性和男性百分比来总结我的数据集。我仍在学习 R 并且无法弄清楚这一点。
我的数据:
目标:
最佳答案
1) proportions 如果您的输入是 df1
(在末尾的注释中重复显示),则将列名称更改为所需的名称并将其转换为矩阵 m
。最后在上面使用 proportions
,margin 为 1 表示行比例——2 表示列比例。请注意,我们在第一行转换为矩阵,因为 proportions
需要这样做。
m <- as.matrix(setNames(df1[-1], c("%M", "%F")))
cbind(df1, 100 * proportions(m, 1))
## Industry Male Female %M %F
## 1 Art/Entertainment 100 500 16.666667 83.33333
## 2 Banking 600 100 85.714286 14.28571
## 3 Healthcare 53 65 44.915254 55.08475
## ...snip...
2) rowSums 另一种方法是将 df1[-1] 除以 rowSums 给出相同的结果。
cbind(df1, setNames(100 * df1[-1] / rowSums(df1[-1]), c("%M", "%F")))
## Industry Male Female %M %F
## 1 Art/Entertainment 100 500 16.666667 83.33333
## 2 Banking 600 100 85.714286 14.28571
## 3 Healthcare 53 65 44.915254 55.08475
## ...snip...
3) dplyr 使用 across
复制具有指定名称的列,然后将其乘以 100 并使用 c_across
来除以列的总和
df1 %>%
group_by(Industry) %>%
mutate(100 * across(.names = "%{.col}") / sum(c_across())) %>%
ungroup
## # A tibble: 9 x 5
## Industry Male Female `%Male` `%Female`
## <chr> <int> <int> <dbl> <dbl>
## 1 Art/Entertainment 100 500 16.7 83.3
## 2 Banking 600 100 85.7 14.3
## 3 Healthcare 53 65 44.9 55.1
## ...snip...
4) transform 这个与另一个答案很接近,但它不会覆盖输入:
transform(df1,
"%M" = 100 * Male / (Male + Female),
"%F" = 100 * Female / (Male + Female),
check.names = FALSE)
## Industry Male Female %M %F
## 1 Art/Entertainment 100 500 16.666667 83.33333
## 2 Banking 600 100 85.714286 14.28571
## 3 Healthcare 53 65 44.915254 55.08475
## ...snip...
注意事项
以可重现的形式输入:
df1 <- structure(list(Industry = c("Art/Entertainment", "Banking", "Healthcare",
"Education", "Military", "Medicine", "Law", "Computer", "Sales"
), Male = c(100L, 600L, 53L, 20L, 47L, 500L, 500L, 200L, 420L
), Female = c(500L, 100L, 65L, 766L, 96L, 400L, 500L, 144L, 69L
)), class = "data.frame", row.names = c(NA, -9L))
关于r - 在 R 中按组汇总百分比,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/66217898/