我想将聚合函数和百分比函数应用于列。我找到了讨论聚合 ( Calculating multiple aggregations with lapply(.SD, ...) in data.table R package ) 的线程和讨论百分比 ( How to obtain percentages per value for the keys in R using data.table? 和 Use data.table to calculate the percentage of occurrence depending on the category in another column ) 的线程,但不是两者都讨论。
请注意,我正在寻找基于 data.table
的方法。 dplyr
不适用于实际数据集。
这是生成示例数据的代码:
set.seed(10)
IData <- data.frame(let = sample( x = LETTERS, size = 10000, replace=TRUE), numbers1 = sample(x = c(1:20000),size = 10000), numbers2 = sample(x = c(1:20000),size = 10000))
IData$let<-as.character(IData$let)
data.table::setDT(IData)
这是使用 dplyr
生成输出的代码
Output <- IData %>%
dplyr::group_by(let) %>%
dplyr::summarise(numbers1.mean = as.double(mean(numbers1)),numbers1.median = as.double(median(numbers1)),numbers2.mean=as.double(mean(numbers2)),sum.numbers1.n = sum(numbers1)) %>%
dplyr::ungroup() %>%
dplyr::mutate(perc.numbers1 = sum.numbers1.n/sum(sum.numbers1.n)) %>%
dplyr::select(numbers1.mean,numbers1.median,numbers2.mean,perc.numbers1)
示例输出(标题)
如果我运行 head(output)
,我会得到:
let numbers1.mean numbers1.median numbers2.mean perc.numbers1
<chr> <dbl> <dbl> <dbl> <dbl>
N 10320.951 10473.0 9374.435 0.03567927
H 9683.590 9256.5 9328.035 0.03648391
L 10223.322 10226.0 9806.210 0.04005400
S 9922.486 9618.0 10233.849 0.03678742
C 9592.620 9226.0 9791.221 0.03517997
F 10323.867 10382.0 10036.561 0.03962035
这是我尝试使用data.table
(未成功)
IData[, as.list(unlist(lapply(.SD, function(x) list(mean=mean(x),median=median(x),sum=sum(x))))), by=let, .SDcols=c("numbers1","numbers2")] [,.(Perc = numbers1.sum/sum(numbers1.sum)),by=let]
我有 2 个问题:
a) 如何使用 data.table
解决这个问题?
b) 我已经看到上面的线程使用了 prop.table
。有人可以指导我如何使用此功能吗?
我真诚地感谢任何指导。
最佳答案
我们可以对 data.table
使用类似的方法
res <- IData[, .(numbers1.mean = mean(numbers1),
numbers1.median = median(numbers1),
numbers2.mean=mean(numbers2),
sum.numbers1.n = sum(numbers1)), let
][, perc.numbers1 := sum.numbers1.n/sum(sum.numbers1.n)
][, c("let", "numbers1.mean", "numbers1.median",
"numbers2.mean", "perc.numbers1"), with = FALSE]
head(res)
# let numbers1.mean numbers1.median numbers2.mean perc.numbers1
#1: N 10320.951 10473.0 9374.435 0.03567927
#2: H 9683.590 9256.5 9328.035 0.03648391
#3: L 10223.322 10226.0 9806.210 0.04005400
#4: S 9922.486 9618.0 10233.849 0.03678742
#5: C 9592.620 9226.0 9791.221 0.03517997
#6: F 10323.867 10382.0 10036.561 0.03962035
关于r - 使用 data.table 计算百分比和其他函数,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/44511942/