r - 为什么有些逻辑运算符这么慢？

在尝试优化我的代码时，我发现了一些 logical与 integer 上的类似操作相比，操作比我预期的要慢或 numeric .

所以我开始重写基本的 bool 运算符 ! , & , | , xor如下:

my.not <- function(x) as.logical(1L - as.integer(x))
my.and <- function(e1, e2) as.logical(as.integer(e1) * as.integer(e2))
my.or  <- function(e1, e2) as.logical(as.integer(e1) + as.integer(e2))
my.xor <- function(e1, e2) as.logical(as.integer(e1) + as.integer(e2) == 1L)

测试一切是否按预期工作:

a <- sample(c(TRUE, FALSE), 1e6, TRUE)
b <- sample(c(TRUE, FALSE), 1e6, TRUE)

identical(!a, my.not(a))             # TRUE
identical(a & b, my.and(a, b))       # TRUE
identical(a | b, my.or(a, b))        # TRUE
identical(xor(a, b), my.xor(a, b))   # TRUE

现在进行基准测试:

library(microbenchmark)
microbenchmark(!a, my.not(a),
               a & b, my.and(a, b),
               a | b, my.or(a, b),
               xor(a, b), my.xor(a, b))

# Unit: milliseconds
#          expr       min        lq    median         uq       max neval
#            !a  1.237437  1.459042  1.463259   1.492671  17.28209   100
#     my.not(a)  6.018455  6.263176  6.414515  15.291194  70.16313   100
#         a & b 32.318530 32.667525 32.769014  32.973878  50.55528   100
#  my.and(a, b)  8.010022  8.592776  8.750786  18.145590  78.38736   100
#         a | b 32.030545 32.383769 32.506937  32.820720 102.43609   100
#   my.or(a, b) 12.089538 12.434793 12.663695  22.046841  32.19095   100
#     xor(a, b) 94.892791 95.480200 96.072202 106.104000 164.19937   100
#  my.xor(a, b) 13.337110 13.708025 14.048350  24.485478  29.75883   100

查看结果，!运营商是唯一一个似乎比我自己做得还不错的运营商。其他三个要慢几倍。有点尴尬Primitive职能。我什至希望实现良好的 bool 运算符应该比整数运算快得多(我是如何实现自己的函数的。)

问题:为什么？糟糕的实现？或者原始函数做了一些我的函数没有做的好事(例如错误检查、特殊情况)？

最佳答案

稍微看一下 C 实现，逻辑和数学运算以不同的方式实现它们的循环。逻辑操作执行类似(在 logic.c:327 中)

library(inline)

or1 <- cfunction(c(x="logical", y="logical"), "
    int nx = LENGTH(x), ny = LENGTH(y), n = nx > ny ? nx : ny;
    SEXP ans = PROTECT(allocVector(LGLSXP, n));
    int x1, y1;
    for (int i = 0; i < n; i++) {
        x1 = LOGICAL(x)[i % nx];
        y1 = LOGICAL(y)[i % ny];
        if ((x1 != NA_LOGICAL && x1) || (y1 != NA_LOGICAL && y1))
            LOGICAL(ans)[i] = 1;
        else if (x1 == 0 && y1 == 0)
            LOGICAL(ans)[i] = 0;
        else
            LOGICAL(ans)[i] = NA_LOGICAL;
    }
    UNPROTECT(1);
    return ans;
")

其中有两个模运算符 %每次迭代。相比之下，算术运算(在 Itermacros.h:54 中)执行类似的操作

or2 <- cfunction(c(x="logical", y="logical"), "
    int nx = LENGTH(x), ny = LENGTH(y), n = nx > ny ? nx : ny;
    SEXP ans = PROTECT(allocVector(LGLSXP, n));
    int x1, y1, ix=0, iy=0;
    for (int i = 0; i < n; i++) {
        x1 = LOGICAL(x)[ix];
        y1 = LOGICAL(x)[iy];
        if (x1 == 0 || y1 == 0)
            LOGICAL(ans)[i] = 0;
        else if (x1 == NA_LOGICAL || y1 == NA_LOGICAL)
            LOGICAL(ans)[i] = NA_LOGICAL;
        else
            LOGICAL(ans)[i] = 1;

        if (++ix == nx) ix = 0;
        if (++iy == ny) iy = 0;
    }
    UNPROTECT(1);
    return ans;
")

进行两次身份测试。这是一个跳过 NA 测试的版本

or3 <- cfunction(c(x="logical", y="logical"), "
    int nx = LENGTH(x), ny = LENGTH(y), n = nx > ny ? nx : ny;
    SEXP ans = PROTECT(allocVector(LGLSXP, n));
    int x1, y1, ix=0, iy=0;
    for (int i = 0; i < n; ++i) {
        x1 = LOGICAL(x)[ix];
        y1 = LOGICAL(y)[iy];
        LOGICAL(ans)[i] = (x1 || y1);
        if (++ix == nx) ix = 0;
        if (++iy == ny) iy = 0;
    }
    UNPROTECT(1);
    return ans;
")

然后是一个避免 LOGICAL 宏的版本

or4 <- cfunction(c(x="logical", y="logical"), "
    int nx = LENGTH(x), ny = LENGTH(y), n = nx > ny ? nx : ny;
    SEXP ans = PROTECT(allocVector(LGLSXP, n));
    int *xp = LOGICAL(x), *yp = LOGICAL(y), *ansp = LOGICAL(ans);
    for (int i = 0, ix = 0, iy = 0; i < n; ++i)
    {
        *ansp++ = xp[ix] || yp[iy];
        ix = (++ix == nx) ? 0 : ix;
        iy = (++iy == ny) ? 0 : iy;
    }
    UNPROTECT(1);
    return ans;
")

这里有一些时间

microbenchmark(my.or(a, b), a|b, or1(a, b), or2(a, b), or3(a, b), or4(a, b))
Unit: milliseconds
        expr       min        lq    median       uq      max neval
 my.or(a, b)  8.002435  8.100143 10.082254 11.56076 12.05393   100
       a | b 23.194829 23.404483 23.860382 24.30020 24.96712   100
   or1(a, b) 17.323696 17.659705 18.069139 18.42815 19.57483   100
   or2(a, b) 13.040063 13.197042 13.692152 14.09390 14.59378   100
   or3(a, b)  9.982705 10.037387 10.578464 10.96945 11.48969   100
   or4(a, b)  5.544096  5.592754  6.106694  6.30091  6.94995   100

a|b的区别和 or1反射(reflect)了此处未实现的内容，例如属性和维度以及对象的特殊处理。来自 or1至 or2反射(reflect)不同回收方式的成本；我很惊讶这里有差异。来自 or2至 or3是 NA 安全的成本。有点难知道 or4 中是否有额外的加速将在基本 R 实现中看到——在用户 C 代码中 LOGICAL()是一个宏，但在基数 R 中它是一个内联函数调用。

代码是用 -O2 编译的旗帜和

> system("clang++ --version")
Ubuntu clang version 3.0-6ubuntu3 (tags/RELEASE_30/final) (based on LLVM 3.0)
Target: x86_64-pc-linux-gnu
Thread model: posix

my.or倍在独立的 R session 之间不是特别一致，有时需要更长的时间；我不知道为什么。以上时间与 R 版本 2.15.3 已修补 (2013-03-13 r62579)；当前的 R-devel 似乎快了 10%。

关于r - 为什么有些逻辑运算符这么慢？，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/18939282/

r - 为什么有些逻辑运算符这么慢？

上一篇：Scala 案例有 22 个字段，但在 Scala 2.11.5 中存在 play-json 问题

下一篇：haskell - 什么是 RWS Monad，何时使用