r - 我如何在 R 中创建这个变量？

使用 R 考虑以下测试数据集:

testdat<-data.frame("id"=c(rep(1,5),rep(2,5),rep(3,5)),
                    "period"=rep(seq(1:5),3),
                    "treat"=c(c(0,1,1,1,0),c(0,0,1,1,1),c(0,0,1,1,1)),
                    "state"=c(rep(0,5),c(0,1,1,1,1),c(0,0,0,1,1)),
                    "int"=c(rep(0,13),1,1))
testdat
   id period treat state int
1   1      1     0     0   0
2   1      2     1     0   0
3   1      3     1     0   0
4   1      4     1     0   0
5   1      5     0     0   0
6   2      1     0     0   0
7   2      2     0     1   0
8   2      3     1     1   0
9   2      4     1     1   0
10  2      5     1     1   0
11  3      1     0     0   0
12  3      2     0     0   0
13  3      3     1     0   0
14  3      4     1     1   1
15  3      5     1     1   1

前4个变量是我有的，int是我要制作的变量。它类似于 treat 和 state 之间的交互，但是这会在第 8-10 行中包含 1，这是不需要的。本质上，我只希望在 treat 期间 state 发生变化时进行交互，否则不会。关于如何创建它(尤其是对于具有一百万个观测值的大规模数据集)有什么想法吗？

编辑:澄清为什么我想要这个措施。我想运行类似以下的回归:

lm(outcome~treat+state+I(treat*state))

但只有当 treat 跨越 state 的变化时，我才真正对交互感兴趣。如果我要运行上述回归，I(treat*state) 汇集了我感兴趣的交互效果以及当 treat 完全为 1 时 state 是1。理论上，我认为这些会产生两种不同的效果，所以我需要将它们分解。我希望这是有道理的，我很乐意提供更多详细信息。

最佳答案

我确信这在 base R 中是可能的，但这里有一个整洁的版本:

library(dplyr)
testdat %>%
  group_by(grp = cumsum(c(FALSE, diff(treat) > 0))) %>%
  mutate(int2 = +(state > 0 & first(state) == 0 & treat > 0)) %>%
  ungroup() %>%
  select(-grp)
# # A tibble: 15 x 6
#       id period treat state   int  int2
#    <dbl>  <int> <dbl> <dbl> <dbl> <int>
#  1     1      1     0     0     0     0
#  2     1      2     1     0     0     0
#  3     1      3     1     0     0     0
#  4     1      4     1     0     0     0
#  5     1      5     0     0     0     0
#  6     2      1     0     0     0     0
#  7     2      2     0     1     0     0
#  8     2      3     1     1     0     0
#  9     2      4     1     1     0     0
# 10     2      5     1     1     0     0
# 11     3      1     0     0     0     0
# 12     3      2     0     0     0     0
# 13     3      3     1     0     0     0
# 14     3      4     1     1     1     1
# 15     3      5     1     1     1     1

分组的替代逻辑使用游程编码，效果相同(建议您 https://stackoverflow.com/a/35313426 ):

testdat %>%
  group_by(grp = { yy <- rle(treat); rep(seq_along(yy$lengths), yy$lengths); }) %>%
  # ...

在那个答案中，我希望 dplyr 有一个等同于 data.table 的 rleid。预期的逻辑是能够按列中连续的相同值进行分组，但不是所有行中的相同值。如果你看看这个中间管道(在清理 grp 之前)，你会看到

testdat %>%
  group_by(grp = { yy <- rle(treat); rep(seq_along(yy$lengths), yy$lengths); }) %>%
  mutate(int2 = +(state > 0 & first(state) == 0 & treat > 0)) %>%
  ungroup()
# # A tibble: 15 x 7
#       id period treat state   int   grp  int2
#    <dbl>  <int> <dbl> <dbl> <dbl> <int> <int>
#  1     1      1     0     0     0     1     0
#  2     1      2     1     0     0     2     0
#  3     1      3     1     0     0     2     0
#  4     1      4     1     0     0     2     0
#  5     1      5     0     0     0     3     0
#  6     2      1     0     0     0     3     0
#  7     2      2     0     1     0     3     0
#  8     2      3     1     1     0     4     0
#  9     2      4     1     1     0     4     0
# 10     2      5     1     1     0     4     0
# 11     3      1     0     0     0     5     0
# 12     3      2     0     0     0     5     0
# 13     3      3     1     0     0     6     0
# 14     3      4     1     1     1     6     1
# 15     3      5     1     1     1     6     1

但这只是一厢情愿。我想我也可以做

my_rleid <- function(x) { yy <- rle(x); rep(seq_along(yy$lengths), yy$lengths); }
testdat %>%
  group_by(grp = my_rleid(treat)) %>%
  # ...

关于r - 我如何在 R 中创建这个变量？，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/61595883/

r - 我如何在 R 中创建这个变量？

上一篇：async-await - slim {#await}..{ :then} block duplicating html with new data

下一篇：r - 如何预测R中具有随机效应的gam模型？