我有一个 data.table
如下:
panelID = c(1:50)
year = c(2001:2010)
country = c("NLD", "BEL", "GER")
urban = c("A", "B", "C")
indust = c("D", "E", "F")
sizes = c(1, 2, 3, 4, 5)
n <- 2
library(data.table)
set.seed(123)
DT <- data.table(
panelID = rep(sample(panelID), each = n),
country = rep(sample(country, length(panelID), replace = T), each = n),
year = c(replicate(length(panelID), sample(year, n))),
some_NA = sample(0:5, 6),
some_NA_factor = sample(0:5, 6),
industry = rep(sample(indust, length(panelID), replace = T), each = n),
urbanisation = rep(sample(urban, length(panelID), replace = T), each = n),
size = rep(sample(sizes, length(panelID), replace = T), each = n),
norm = round(runif(100)/10, 2),
sales = round(rnorm(10, 10, 10), 2),
Happiness = sample(10, 10),
Sex = round(rnorm(10, 0.75, 0.3), 2),
Age = sample(100, 100),
Educ = round(rnorm(10, 0.75, 0.3), 2)
)
DT [, uniqueID := .I] # Creates a unique ID
DT[DT == 0] <- NA
DT$sales[DT$sales< 0] <- NA
DT <- as.data.frame(DT)
我想要的是 panelID
的数量,其中 size
的总和等于 8。所以我想我会这样做:
DT[sum(size)==8, condition:=1, by=panelID]
我在这里做错了什么?
最佳答案
使用data.table
:
DT[,conditional:=ifelse(sum(size)==8,1,0),by=panelID][]
# To get the lengths of those which are True(1), save the above as res
#nrow(res[res[,conditional==1],"panelID"])
或者简单地像@chinsoon12建议的那样:
DT[, conditional := +(sum(size)==8), panelID]
结果(头):
panelID country year some_NA some_NA_factor industry urbanisation size norm sales
1: 31 GER 2010 4 1 F C 4 0.09 5.63
2: 31 GER 2005 2 NA F C 4 0.03 13.31
3: 15 NLD 2005 NA 4 D C 3 0.05 NA
4: 15 NLD 2008 1 5 D C 3 0.01 12.12
5: 14 BEL 2003 5 3 E B 1 0.09 22.37
6: 14 BEL 2002 3 2 E B 1 0.04 30.38
Happiness Sex Age Educ uniqueID conditional
1: 7 0.69 62 0.25 1 1
2: 3 1.00 10 1.31 2 1
3: 10 0.66 59 0.73 3 0
4: 9 0.85 49 0.88 4 0
5: 2 0.34 7 0.90 5 0
6: 5 0.84 61 1.11 6 0
关于r - 创建以按组的变量总和为条件的变量,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/60993550/