使用像原始这样的数据集:
id <- c("JF", "GH", "GH", "ANN", "GH", "ROG", "JF")
group <- c("most", "least", "most", "least", "least", "most", "least")
NP <- c(4,6,18,1,3,12,8)
iso_USA <- c(1, 0, 0, 0, 0, 1, 1)
iso_CHN <- c(0, 1, 1, 0, 0, 0, 0)
color <- c("blue", "orange", "blue", "blue", "red", "orange", "black")
original <- data.frame(id, group, NP, iso_USA, iso_CHN, color)
numeric <- unlist(lapply(original, is.numeric))
numeric <- names(original[ , numeric])
char <- unlist(lapply(original, is.character))
char <- names(original[ , char])
char <- char[-1] #remove id from variables of interest
我想按“组”分组并计算数值变量的中位数和字符变量的众数。因此,数据看起来像 original2。请注意,我的实际数据集的列数比此处显示的模拟版本多得多:
group <- c("least", "most")
NP <- c(6,12)
iso_USA <- c(0,1)
iso_CHN <- c(0, 0)
color <- c("orange", "blue")
original2 <- data.frame(group, NP, iso_USA, iso_CHN, color)
有什么线索吗?
最佳答案
使用 dplyr
的 across
功能和接受的答案 at the FAQ about implementing a mode
function :
Mode <- function(x) {
ux <- unique(x)
ux[which.max(tabulate(match(x, ux)))]
}
library(dplyr)
original %>%
select(-id) %>%
group_by(group) %>%
summarize(
across(where(is.numeric), median),
across(where(is.character), Mode)
)
# # A tibble: 2 × 6
# group NP iso_USA iso_CHN color
# <chr> <dbl> <dbl> <dbl> <chr>
# 1 least 4.5 0 0 orange
# 2 most 12 1 0 blue
关于r - R中所有数字的中位数和所有字符的模式,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/70189952/