r - 使用两个变量过滤 data.table，一种优雅的快速方法

我想问一下，是否有一种方法可以根据多个变量的组合进行过滤。更具体地说:

library(dplyr)
library(plyr)
library(data.table)

data <- iris %>% cbind( group = rep(c("a", "b", "c"), nrow(iris))) %>% as.data.table()

   Sepal.Length Sepal.Width Petal.Length Petal.Width Species group
1:          5.1         3.5          1.4         0.2  setosa     a
2:          4.9         3.0          1.4         0.2  setosa     b
3:          4.7         3.2          1.3         0.2  setosa     c
4:          4.6         3.1          1.5         0.2  setosa     a
5:          5.0         3.6          1.4         0.2  setosa     b
6:          5.4         3.9          1.7         0.4  setosa     c

我想根据以下数据表过滤它们

filter <- data.table(Species = c("setosa", "versicolor", 'setosa'), group = c('a', "b", 'c'))
      Species group      filter1
1:     setosa     a     setosa a
2: versicolor     b versicolor b
3:     setosa     c     setosa c

我可以这样做:

data[paste(Species, group) %in% filter[, filter1 := paste(Species, group)]$filter1]

但是我想知道是否有一种方法可以更有效/更快/更轻松地做到这一点: 也许是这样的:

data[.(Species, group) %in% filter] # does not work

最佳答案

在这种情况下，你可以这样做

data[filter, on=names(filter), nomatch=0]

查看 Perform a semi-join with data.table用于类似的过滤连接。

关于r - 使用两个变量过滤 data.table，一种优雅的快速方法，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/46691368/

上一篇：sql - 两个相似查询之间的查询性能发生巨大变化

下一篇：facebook - 如何订阅 FB 页面到 Facebook 应用程序？

r - knitr - 更改代码缩进

r - 叠加直方图和 xyplot

r - 有没有办法从 j 部分中分配 R data.table 列的类

r - 使用 runif 计算 data.table 中的列

r - SF : Generate random points with maximal distance condition

r - 将逗号类分配给数据框中的多个列

r - 使用data.tables，尝试按列索引聚合数据

r - data.table 用 NA 替换一个值

使用动态窗口大小滚动总和并跳过前几行