在 R 中,我有一个像这样的 data.frame:
df1 <- data.frame(
grade = rep(LETTERS[1:5], 4),
sex = c(rep("male", 5), rep("female", 5), rep("male", 4), rep("female", 6)),
class = c(rep(1, 10), rep(2, 10))
)
df1
grade sex class
1 A male 1
2 B male 1
3 C male 1
4 D male 1
5 E male 1
6 A female 1
7 B female 1
8 C female 1
9 D female 1
10 E female 1
11 A male 2
12 B male 2
13 C male 2
14 D male 2
15 E female 2
16 A female 2
17 B female 2
18 C female 2
19 D female 2
20 E female 2
我想计算每个类(class)的性别百分比并制作另一个数据框,例如:
Class Male_percent Female_percentage
1 50% 50%
2 40% 60%
有人可以教我怎么做吗? 这个问题以前可能有人问过,但我不知道这个问题的关键词是什么。如果我再次问同样的问题,我很抱歉。
最佳答案
你可以试试
prop.table(table(df1[3:2]),1)*100
# sex
#class female male
# 1 50 50
# 2 60 40
或者使用data.table
library(data.table)
setDT(df1)[, .N, by = .(class, sex)
][, .(Male_percent = paste0(100 * N[sex == 'male'] / sum(N), '%'),
Female_percent = paste0(100 * N[sex == 'female'] / sum(N), '%')),
by = class]
# class Male_percent Female_percent
#1: 1 50% 50%
#2: 2 40% 60%
或者使用dplyr
library(dplyr)
df1 %>%
group_by(class) %>%
summarise(Male_Percent= sprintf('%d%%', 100*sum(sex=='male')/n()),
Female_Percent = sprintf('%d%%', 100*sum(sex=='female')/n()))
# class Male_Percent Female_Percent
#1 1 50% 50%
#2 2 40% 60%
或者
library(sqldf)
res1 <- sqldf('select class,
100*sum(sex=="male")/count(sex) as m,
100*sum(sex=="female")/count(sex) as f,
"%" as p
from df1
group by class')
sqldf("select class,
m||p as Male_Percent,
f||p as Female_Percent
from res1")
# class Male_Percent Female_Percent
#1 1 50% 50%
#2 2 40% 60%
更新
根据@G.Grothendieck的评论,sqldf
评论可以简化为
sqldf("select class,
(100 * avg(sex = 'male')) || '%' as Male_Percent,
(100 * avg(sex = 'female')) || '%' as Female_Percent
from df1 group
by class")
# class Male_Percent Female_Percent
#1 1 50.0% 50.0%
#2 2 40.0% 60.0%
关于r - 如何统计和计算 R data.frame 中两列的百分比?,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/30951617/