performance - Scala:嵌套 for 循环和 for 理解之间的性能差异

我的印象是，在 Scala (2.11.7) 中，以下代码片段将具有类似的性能特征，但看来我错了:

a: IndexedSeq[(Int, Int)]

选项 1:

var ans = 0
for {
  i <- a.indices
  (x1, y1) = a(i)
  j <- a.indices drop (i+1)
  (x2, y2) = a(j)
  if x1 == x2 || y1 == y2
} ans += 1

选项 2:

var ans = 0
for (i <- a.indices) {
  val (x1, y1) = a(i)
  for (j <- a.indices drop (i+1)) {
    val (x2, y2) = a(j)
    if (x1 == x2 || y1 == y2) ans += 1
  }
}

但是，即使对于小尺寸(a.length == 100)，第二种方式也比第一种方式快大约 5-10 倍。另外，a 在这里是一个 IndexedSeq，因此随机访问应该没有那么重要(参见 f1 与 f2以下)。这是完整的基准测试:

import scala.util.Random

object PerfTester extends App {

  def f1(a: IndexedSeq[(Int, Int)]) = {
    var ans = 0
    for {
      i <- a.indices
      j <- a.indices drop (i+1)
      ((x1, y1), (x2, y2)) = (a(i), a(j))
      if x1 == x2 || y1 == y2
    } ans += 1
    ans
  }

  def f2(a: IndexedSeq[(Int, Int)]) = {
    var ans = 0
    for {
      i <- a.indices
      (x1, y1) = a(i)
      j <- a.indices drop (i+1)
      (x2, y2) = a(j)
      if x1 == x2 || y1 == y2
    } ans += 1
    ans
  }

  def f3(a: IndexedSeq[(Int, Int)]) = {
    var ans = 0
    for (i <- a.indices) {
      val (x1, y1) = a(i)
      for (j <- a.indices drop (i+1)) {
        val (x2, y2) = a(j)
        if (x1 == x2 || y1 == y2) ans += 1
      }
    }
    ans
  }

  def profile[R](code: => R, t: Long = System.nanoTime()) = (code, (System.nanoTime() - t)/1e6)

  val n = 1000
  val data = IndexedSeq.fill(n) {
    Random.nextInt(100) -> Random.nextInt(100)
  }
  val (r1, t1) = profile(f1(data))
  val (r2, t2) = profile(f2(data))
  val (r3, t3) = profile(f3(data))
  require(r1 == r2 && r2 == r3)
  println(s"f1: $t1 ms")
  println(s"f2: $t2 ms")
  println(s"f3: $t3 ms")
}

我知道这类测试很容易受到 JVM 预热、热点优化和排序等的影响，因此我通过随机调用 f1、f2、f3 并在许多不同的运行中平均运行时间。

我在 codeforces.com 上进行编程竞赛时遇到了这个问题:

这“超出了时间限制”:http://codeforces.com/contest/629/submission/16248545
这已被“接受”:http://codeforces.com/contest/629/submission/16248804

最佳答案

尽管代码在概念上是等价的，但它并不是 Scala 编码前面的 for 循环的方式。特别是，当您编写 y = x 时，Scala 会执行单独的映射操作，并将答案捆绑到一个元组中。

如果您在命令行询问:

scala -Xprint:typer -e 'for {i <- 1 to 10; j = i*i } println(j)'

你得到(部分):

def main(args: Array[String]): Unit = {
  final class $anon extends scala.AnyRef {
    def <init>(): <$anon: AnyRef> = {
      $anon.super.<init>();
      ()
    };
    scala.this.Predef.intWrapper(1).to(10).map[(Int, Int), scala.collection.immutable.IndexedSeq[(Int, Int)]](((i: Int) => {
      val j: Int = i.*(i);
      scala.Tuple2.apply[Int, Int](i, j)
    }))(immutable.this.IndexedSeq.canBuildFrom[(Int, Int)]).foreach[Unit](((x$1: (Int, Int)) => (x$1: (Int, Int) @unchecked) match {
        case (_1: Int, _2: Int)(Int, Int)((i @ _), (j @ _)) => scala.this.Predef.println(j)
    }))
  };
  {
    new $anon();
    ()
  }
}

您可以在第 7 行看到额外的映射，并在第 9 行创建元组。然后只需在第 11 行再次提取它即可。与将其插入到 foreach 函数的闭包中相比，这是非常昂贵，特别是对于我在本例中使用的整数，因为它们必须装箱。

有人可能会说现有的方法还有待改进，但这就是目前的实现方式。

关于performance - Scala:嵌套 for 循环和 for 理解之间的性能差异，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/35528057/

performance - Scala:嵌套 for 循环和 for 理解之间的性能差异

上一篇：r - 将数据分配给剪切产生的级别 (R)

下一篇：php - 从 PHP 数组构建 Bootstrap 下拉多级菜单