我的数据框看起来像这样:
id fruit1 fruit2 fruit3
1 apple banana orange
2 banana
3 apple apple
4 banana apple
5 orange apple
有没有办法找到每个分母为 5 的字符串所占个体的百分比?
因此结果将是苹果 = .8、香蕉 = .6 和橙子 = .4
我正在使用的实际数据库很大,因此如果解决方案不需要键入每个字符串,那就太好了。
最佳答案
tidyverse 中的一个选项将被 reshape 为“长”格式,然后旋转回“宽”并获取平均值
library(dplyr)
library(tidyr)
dd %>%
# // reshape to long format
pivot_longer(cols = -id) %>%
# // remove the blank rows
filter(value != '') %>%
# // get the distinct rows
distinct(id, value) %>%
# // reshape to wide format
pivot_wider(names_from = value, values_from = value,
values_fn = list(value = length), values_fill = list(value = 0)) %>%
# get the mean of columns
summarise(across(apple:orange, mean))
# A tibble: 1 x 3
# apple banana orange
# <dbl> <dbl> <dbl>
#1 0.8 0.6 0.4
数据
dd <- structure(list(id = 1:5, fruit1 = c("apple", "banana", "apple",
"banana", "orange"), fruit2 = c("banana", "", "apple", "apple",
"apple"), fruit3 = c("orange", "", "", "", "")),
class = "data.frame", row.names = c(NA,
-5L))
关于r - 查找数据框中字符串的百分比,但每行只计算一次,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/63092410/