给定向量中的数字(例如 1 5 10 12),我正在寻找向量中落在我选择的数字范围之间的数字(例如 c(9, 11))。我希望在这个小例子中返回 vec = c(10)
。
下面是一个更大的 MWE,我使用 dplyr::between
对相关值进行子集化...但是,我正在寻找一种更快的方法来执行此操作(不使用并行化作为解决方案)。如果我可以更好地解释一些事情,请告诉我。
# Data
set.seed(1)
targets <- sort(sample(1:1e8, 1e7, replace=FALSE))
vec <- c(1345706, 1405938)
# Function
dplyr_between <- function(vec, targets) {
require(dplyr)
targets <- targets[dplyr::between(targets, vec[1], vec[2])]
return(targets)
}
test <- dplyr_between(vec, targets)
# 1345732 1345761 1345779 1345780 1345797
编辑使用 x < max & x > min
添加基于评论的功能(已删除)
# More Functions
base_compare <- function(vec, targets) {
targets <- targets[targets < vec[2] & targets > vec[1]]
return(targets)
}
base_compare(vec, targets)
# 1345732 1345761 1345779 1345780 1345797
使用 @docendo 中的 data.table::inrange
进行编辑
# inrange function
dt_inrange <- function(vec, targets) {
require(data.table)
targets <- targets[inrange(targets, vec[1], vec[2])]
return(targets)
}
dt_inrange(vec, targets)
# 1345732 1345761 1345779 1345780 1345797
基准
library(microbenchmark)
microbenchmark(dplyr_between(vec, targets), base_compare(vec, targets), dt_inrange(vec, targets), times=10L)
# Unit: milliseconds
# expr min lq mean median uq max
# dplyr_between(vec, targets) 265.5192 283.5998 296.0947 296.7552 309.4403 323.3634
# base_compare(vec, targets) 303.4629 317.8389 343.6311 343.3765 354.6891 427.1962
# dt_inrange(vec, targets) 129.3800 131.1634 142.8658 144.4569 149.3728 164.5824
# neval
# 10
# 10
# 10
谢谢!
最佳答案
简单的 Rcpp 实现:
temp.cpp 中的 C++ 代码”
#include <Rcpp.h>
#include <vector>
using namespace Rcpp;
// [[Rcpp::plugins(cpp11)]]
// [[Rcpp::export]]
std::vector<int> betweenRcpp(IntegerVector vec, int lower, int upper) {
std::vector<int> ret;
for(int i=0; i<vec.size(); i++) {
if((vec[i] > lower) & (vec[i] < upper)) {
ret.push_back(vec[i]);
} else if(vec[i] >= upper) {
break;
}
}
return ret;
}
R代码:
library(Rcpp)
library(microbenchmark)
setwd("~/Desktop")
# Data
set.seed(1)
targets <- sort(sample(1:1e8, 1e7, replace=FALSE))
vec <- c(1345706, 1405938)
# Function
dplyr_between <- function(vec, targets) {
require(dplyr)
targets <- targets[dplyr::between(targets, vec[1], vec[2])]
return(targets)
}
sourceCpp("temp.cpp")
test <- dplyr_between(vec, targets)
test2 <- betweenRcpp(targets, vec[1], vec[2])
microbenchmark(dplyr_between(vec, targets), betweenRcpp(targets, vec[1], vec[2]), times=10)
Unit: microseconds
expr min lq mean median uq max neval cld
dplyr_between(vec, targets) 72066.027 77809.681 108023.3793 103723.4075 125280.89 173892.552 10 b
betweenRcpp(targets, vec[1], vec[2]) 439.124 464.475 502.7439 481.8025 543.12 594.578 10 a
测试解决方案之间的相等性:
all(test == test2)
关于r - 查找目标数字之间所有数字的最快方法,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/47019127/