r - 满足给定条件的不同列之间的值差异

这是我的玩具数据。我有 val 和四分位数变量 q0 到 q4。

 df <- tibble::tribble(
      ~val, ~q0, ~q1, ~q2,  ~q3, ~q4, ~q, ~diff,
       15L, 15L, 15L, 15L,   15, 15L, 4L,     0,
       17L,  2L, 16L, 30L,   34, 54L, 2L,    13,
       29L,  2L, 16L, 30L,   34, 54L, 2L,     1,
       25L,  2L, 17L, 20L,   26, 43L, 3L,     1 )

我需要计算最后两个变量:

当 val 介于 q1 和 q2 之间时，我选择(q2 中的)2 个作为变量 q(第二个行)
如果出现平局，我会选择 qs 中的最大值(例如第一行中的 q = 4)
diff 是 q 和 val 之间的差异。因此，对于第 1 行，q4-val = 0，对于第 2 行，q2 - val = 30 - 17 = 13。

如何计算 R 中的 q 和 diff，最好使用 tidyverse？也许我们可以利用这里的答案:Extract column name and specific value based on a condition .

最佳答案

当你有像这样更复杂的逻辑时，我发现将其包装在函数中通常更好。以后维护、阅读、调试会更加容易。当使用大量嵌套的 ifelse 语句或大的 case_when 类型的语句时，我也会格外小心。在接受的答案中，q 只能是 2、3 或 4。没有提供 q 为 1 的情况，您当然希望将其作为最终结果中的一个选项产品。

df <- tibble::tribble(
~val, ~q0, ~q1, ~q2,  ~q3, ~q4, ~q, ~diff,
15L, 15L, 15L, 15L,   15, 15L, 4L,     0,
17L,  2L, 16L, 30L,   34, 54L, 2L,    13,
29L,  2L, 16L, 30L,   34, 54L, 2L,     1,
25L,  2L, 17L, 20L,   26, 43L, 3L,     1 )

whichQ <- function(df, qs = c('q0', 'q1', 'q2', 'q3', 'q4')) {
    # This has the flexibility of changing your column names / using more or less Q splits
    qDf <- df[, qs]
    # This finds the right quantile by finding how many you are larger than
    # It works because the q's are sequential
    whichGreater <- df$val >= qDf
    q <- apply(whichGreater, 1, sum)
    # 4 is a special case because there is no next quantile
    q <- ifelse(q == 5, 4, q)
    df$q <- q
    # Go through the Qs we found and grab the value of that column
    diff <- sapply(seq_along(q), function(x) {
        as.integer(qDf[x, q[x]+1])
    })
    # Get the difference
    df$diff <- diff - df$val
    df
}

您仍然可以将其与 tidyverse 管道一起使用，但只要您将函数命名为有用的名称，就会更清楚(我认为)发生了什么。

df %>% 
    whichQ %>% 
    head(2)

关于r - 满足给定条件的不同列之间的值差异，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/53285617/

r - 满足给定条件的不同列之间的值差异

上一篇：c# - 客户端上的抽象类从 swagger 规范 API net core 生成代码

下一篇：matplotlib - 如何在 matplotlib 中制作依赖 slider