r - 如何在 R 中编写 Mood 中值检验的排列等价代码? (使用排列获得 p 值)

标签 r testing permutation median

我可以对两个样本 t 检验进行此操作,但不能进行中值检验、Wilcoxon 检验或 Hodges Lehmann 检验

data_2000 <- c(500,450,600,700,550,551,552)

data_2019 <- c(560,460,620,720,540,600,750)

mean(data_2000)

mean(data_2019)

mean(data_2019) - mean(data_2000)

combined_data <- c(data_2000, data_2019)

set.seed(123)

null_dist <- c()
for (i in 1:100000) {
  shuffled_data <- sample(combined_data)  
  shuffled_2000 <- shuffled_data[1:7] 
  shuffled_2019 <- shuffled_data[8:14]  
  null_dist[i] <- mean(shuffled_2019) - mean(shuffled_2000)
}

(p_value <- (sum(null_dist >= 49.57143) + sum(null_dist <= 
 `enter code here`-49.57143))/length(null_dist))

最佳答案

我认为这就是您想要做的。我尽可能少地改变了你的代码。有像 infer 这样的包可以为您完成此操作,for 循环不是最有效的,但它已经足够好了,可以帮助您学习。只要我们循环,我就会同时计算平均值和中位数,因为代码的所有其他部分都是相同的。 ifelse 是一种将 1 和 0 相加的简单方法。


data_2000 <- c(500,450,600,700,550,551,552)
data_2019 <- c(560,460,620,720,540,600,750)

delta_mean <- mean(data_2019) - mean(data_2000)
delta_median <- median(data_2019) - median(data_2000)

combined_data <- c(data_2000, data_2019)

trials <- 100000

set.seed(123)

mean_diff <- c()
median_diff <- c()

for (i in 1:trials) {
   shuffled_data <- sample(combined_data)  
   shuffled_2000 <- shuffled_data[1:7] 
   shuffled_2019 <- shuffled_data[8:14]  
   mean_diff[i] <- mean(shuffled_2019) - mean(shuffled_2000)
   median_diff[i] <- median(shuffled_2019) - median(shuffled_2000)
}

p_mean <- sum(ifelse(mean_diff > delta_mean | mean_diff < -1 * delta_mean, 1, 0)) / trials
p_median <- sum(ifelse(median_diff > delta_median | median_diff < -1 * delta_median, 1, 0)) / trials

p_mean
#> [1] 0.31888
p_median
#> [1] 0.24446

跟进您有关 HL 测试的问题。引用维基百科

The Hodges–Lehmann statistic also estimates the difference between two populations. For two sets of data with m and n observations, the set of two-element sets made of them is their Cartesian product, which contains m × n pairs of points (one from each set); each such pair defines one difference of values. The Hodges–Lehmann statistic is the median of the m × n differences.

您可以使用以下代码在您的数据上运行它...

不要运行 100,000 次,每次的答案都是相同的,因为您已经完成了所有 49 种可能的配对

hl_df <- expand.grid(data_2019, data_2000)
hl_df$pair_diffs <- hl_df$Var1 - hl_df$Var2
median(hl_df$pair_diffs)
[1] 49

关于r - 如何在 R 中编写 Mood 中值检验的排列等价代码? (使用排列获得 p 值),我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/63315645/

相关文章:

r - 如何过滤R中的数据?

r - for() 循环步长

oracle - 数据库结构(oracle、mysql、postgreSQl..)到HSQLDB或H2的转换工具

javascript - JavaScript 中 JSON 时间值的排列

r - 使用 opts 更改 ggplot2 中的 axis.line 不起作用。这个怎么做?

r - 返回除R中抛出的最后一条消息以外的错误消息

testing - apache 和 jboss 上的功能测试覆盖率工具

java - JAX-WS 单元测试

python - 如何在Python中的列表中生成多个分组的排列

string - 计算给定字符串的所有可能子串