r - 创建以按组的变量总和为条件的变量

标签 r sum data.table conditional-statements

我有一个 data.table 如下:

panelID = c(1:50)   
year    = c(2001:2010)
country = c("NLD", "BEL", "GER")
urban   = c("A", "B", "C")
indust  = c("D", "E", "F")
sizes   = c(1, 2, 3, 4, 5)
n <- 2

library(data.table)

set.seed(123)
DT <- data.table(
    panelID = rep(sample(panelID), each = n),
    country = rep(sample(country, length(panelID), replace = T), each = n),
    year    = c(replicate(length(panelID), sample(year, n))),
    some_NA = sample(0:5, 6),                                             
    some_NA_factor = sample(0:5, 6),
    industry       = rep(sample(indust, length(panelID), replace = T), each = n),
    urbanisation   = rep(sample(urban, length(panelID), replace = T), each = n),
    size      = rep(sample(sizes, length(panelID), replace = T), each = n),
    norm      = round(runif(100)/10, 2),
    sales     = round(rnorm(10, 10, 10), 2),
    Happiness = sample(10, 10),
    Sex       = round(rnorm(10, 0.75, 0.3), 2),
    Age       = sample(100, 100),
    Educ      = round(rnorm(10, 0.75, 0.3), 2)
)        
DT [, uniqueID := .I]  # Creates a unique ID     
DT[DT == 0] <- NA 
DT$sales[DT$sales< 0] <- NA 
DT <- as.data.frame(DT)

我想要的是 panelID 的数量,其中 size 的总和等于 8。所以我想我会这样做:

DT[sum(size)==8, condition:=1, by=panelID]

我在这里做错了什么?

最佳答案

使用data.table:

DT[,conditional:=ifelse(sum(size)==8,1,0),by=panelID][]
# To get the lengths of those which are True(1), save the above as res
#nrow(res[res[,conditional==1],"panelID"])

或者简单地像@chinsoon12建议的那样:

DT[, conditional := +(sum(size)==8), panelID]

结果(头):

 panelID country year some_NA some_NA_factor industry urbanisation size norm sales
1:      31     GER 2010       4              1        F            C    4 0.09  5.63
2:      31     GER 2005       2             NA        F            C    4 0.03 13.31
3:      15     NLD 2005      NA              4        D            C    3 0.05    NA
4:      15     NLD 2008       1              5        D            C    3 0.01 12.12
5:      14     BEL 2003       5              3        E            B    1 0.09 22.37
6:      14     BEL 2002       3              2        E            B    1 0.04 30.38
   Happiness  Sex Age Educ uniqueID conditional
1:         7 0.69  62 0.25        1           1
2:         3 1.00  10 1.31        2           1
3:        10 0.66  59 0.73        3           0
4:         9 0.85  49 0.88        4           0
5:         2 0.34   7 0.90        5           0
6:         5 0.84  61 1.11        6           0

关于r - 创建以按组的变量总和为条件的变量,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/60993550/

相关文章:

r - 用另一个数据表中一列的值更新一个数据表中的一列 NA

r - 如何使 randomForest 模型尺寸变小?

MySQL连接两个表并添加列值

mysql:顺序 ->限制 ->总和...可能吗?

sql - 从 x 年前开始计算运行总和

r - 如何在一个 data.table 调用中对行进行子集化和排序?

r - dplyr 改变/替换行子集上的几列

r - 选择一个子集,其中分类变量(列)可以有 2 个值

r - 在 R 中更快地创建差异列

r - 如何将数字转换为正常日期?