对 R 中数据帧的每一行运行 Fisher 测试

标签 r matrix dataframe

我有一个包含约 3000 名研究人员进行的约 50k 测量值的数据框。

INVESTIGATOR_ID \\\ SAMPLE_ID \\\ MEASUREMENT
1000            \\\ 38942     \\\ 20.1
1000            \\\ 38942     \\\ 10.2
1001            \\\ 38432     \\\ 5.6
1002            \\\ 553       \\\ 10.6
...

My goal is to compare sample measurements per investigator to measurements from the entire data set:

  1. For each investigator, count those measurements that are +/- one standard deviation from the measurement mean collected by that investigator.
  2. For the entire data frame, count those measurements that are +/- one standard deviation from the mean.
  3. For each investigator that has sample measurements +/- one standard deviation from the mean, run a Fisher's exact test to determine if the number of samples is significant (compared to the entire data frame).

I've used the Plyr library (ddply) to summarise the data by INVESTIGATOR_ID. Merging data together, the end result is a data frame, where each row consists of an investigator ID, the number of samples measured by that investigator, number of samples measured by that investigator +/- 1 SD, 15000, and 50000 (where 15000 and 50000 are the corresponding sample numbers +/- 1 SD and the total number of samples for the entire data frame).

INVESTIGATOR_ID \\\ NUMBER_OF_SAMPLES \\\ NUMBER_OF_SAMPLES_SD \\\ 15000 \\\ 50000

如何获取数据框的每一行,将字段 c(2:5) 转换为矩阵,运行 Fisher 测试,并创建结果的新数据框?

感谢您的任何建议。

最佳答案

类似的东西(改编 self 的脚本,可能需要更多修改以满足您的需求):

get_fisher <- function(df){
  mat <- matrix(as.numeric(df[c(2:5)]), ncol=2)
  f <- fisher.test(as.table(mat), alt="two.sided")
  return(c(df[1], f$p.value))
}

fishers <- apply(df, 1,  get_fisher)

关于对 R 中数据帧的每一行运行 Fisher 测试,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/14983579/

相关文章:

python - 使用 csv 读取将值添加到 DASK 数据帧导入的列

r - 在 ggpairs() 中调整组文本美学

r - 更改导航栏 flexdashboard 的方向

windows - 安装新版本的 data.table(特别是 Rforge 的 1.8.11)

java - 尝试转置数组,但数组越界

r - 查找两个表之间的差异

R 仅从字符串中提取 3 位数字

用于操作矩阵和向量叉积的 Python 程序

matlab - 在 Matlab 中将矩阵与数组组合

python - 仅当每行中的值数量高于 python pandas 中的特定数量时才计算平均值