我有一个包含约 3000 名研究人员进行的约 50k 测量值的数据框。
INVESTIGATOR_ID \\\ SAMPLE_ID \\\ MEASUREMENT 1000 \\\ 38942 \\\ 20.1 1000 \\\ 38942 \\\ 10.2 1001 \\\ 38432 \\\ 5.6 1002 \\\ 553 \\\ 10.6 ...
My goal is to compare sample measurements per investigator to measurements from the entire data set:
- For each investigator, count those measurements that are +/- one standard deviation from the measurement mean collected by that investigator.
- For the entire data frame, count those measurements that are +/- one standard deviation from the mean.
- For each investigator that has sample measurements +/- one standard deviation from the mean, run a Fisher's exact test to determine if the number of samples is significant (compared to the entire data frame).
I've used the Plyr library (ddply
) to summarise the data by INVESTIGATOR_ID
. Merging data together, the end result is a data frame, where each row consists of an investigator ID, the number of samples measured by that investigator, number of samples measured by that investigator +/- 1 SD, 15000, and 50000 (where 15000 and 50000 are the corresponding sample numbers +/- 1 SD and the total number of samples for the entire data frame).
INVESTIGATOR_ID \\\ NUMBER_OF_SAMPLES \\\ NUMBER_OF_SAMPLES_SD \\\ 15000 \\\ 50000
如何获取数据框的每一行,将字段 c(2:5)
转换为矩阵,运行 Fisher 测试,并创建结果的新数据框?
感谢您的任何建议。
最佳答案
类似的东西(改编 self 的脚本,可能需要更多修改以满足您的需求):
get_fisher <- function(df){
mat <- matrix(as.numeric(df[c(2:5)]), ncol=2)
f <- fisher.test(as.table(mat), alt="two.sided")
return(c(df[1], f$p.value))
}
fishers <- apply(df, 1, get_fisher)
关于对 R 中数据帧的每一行运行 Fisher 测试,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/14983579/