java - 我应该如何用流来总结一些东西？

我已经看到并尝试了如何在流中求和的不同实现。这是我的代码:

List<Person> persons = new ArrayList<Person>();

for(int i=0; i < 10000000; i++){
    persons.add(new Person("random", 26));
}

Long start = System.currentTimeMillis();
int test = persons.stream().collect(Collectors.summingInt(p -> p.getAge()));
Long end = System.currentTimeMillis();
System.out.println("Sum of ages = " + test + " and it took : " + (end - start) + " ms with collectors");

Long start3 = System.currentTimeMillis();
int test3 = persons.parallelStream().collect(Collectors.summingInt(p -> p.getAge()));
Long end3 = System.currentTimeMillis();
System.out.println("Sum of ages = " + test3 + " and it took : " + (end3 - start3) + " ms with collectors and parallel stream");


Long start2 = System.currentTimeMillis();
int test2 = persons.stream().mapToInt(p -> p.getAge()).sum();
Long end2 = System.currentTimeMillis();
System.out.println("Sum of ages = " + test2 + " and it took : " + (end2 - start2) + " ms with map and sum");

Long start4 = System.currentTimeMillis();
int test4 = persons.parallelStream().mapToInt(p -> p.getAge()).sum();
Long end4 = System.currentTimeMillis();
System.out.println("Sum of ages = " + test4 + " and it took : " + (end4 - start4) + " ms with map and sum and parallel stream");

这给了我以下结果:

Sum of ages = 220000000 and it took : 110 ms with collectors
Sum of ages = 220000000 and it took : 272 ms with collectors and parallel stream
Sum of ages = 220000000 and it took : 137 ms with map and sum
Sum of ages = 220000000 and it took : 134 ms with map and sum and parallel stream

我尝试了几次，每次都给我不同的结果(大多数时候最后一个解决方案是最好的)，所以我想知道:

1)正确的做法是什么？

2)为什么？ (与其他解决方案有什么区别？)

最佳答案

在我们进入实际答案之前，你应该知道一些事情:

您的测试结果可能会有很大差异，这取决于许多因素(例如您运行它的计算机)。以下是在我的 8 核机器上运行一次的结果:

Sum of ages = 260000000 and it took : 94 ms with collectors
Sum of ages = 260000000 and it took : 61 ms with collectors and parallel stream
Sum of ages = 260000000 and it took : 70 ms with map and sum
Sum of ages = 260000000 and it took : 94 ms with map and sum and parallel stream

然后在以后的运行中:

Sum of ages = 260000000 and it took : 68 ms with collectors
Sum of ages = 260000000 and it took : 67 ms with collectors and parallel stream
Sum of ages = 260000000 and it took : 66 ms with map and sum
Sum of ages = 260000000 and it took : 109 ms with map and sum and parallel stream

微基准测试不是一个简单的话题。有一些方法可以做到(我稍后会介绍)但只是尝试使用 System.currentTimeMillies()在大多数情况下不会可靠地工作。

仅仅因为 Java 8 使并行操作变得简单，这并不意味着它们应该在任何地方使用。并行操作在某些情况下有意义，而在其他情况下则不然。

好的，现在让我们来看看您正在使用的各种方法。

顺序收集器: summingInt您使用的收集器具有以下实现:

public static <T> Collector<T, ?, Integer> summingInt(ToIntFunction<? super T> mapper) {
    return new CollectorImpl<>(
            () -> new int[1],
            (a, t) -> { a[0] += mapper.applyAsInt(t); },
            (a, b) -> { a[0] += b[0]; return a; },
            a -> a[0], Collections.emptySet());
}

因此，首先将创建一个包含一个元素的新数组。然后对于每个 Person流中的元素 collect函数将使用 Person#getAge()函数将年龄检索为 Integer (不是 int !)并将该年龄添加到以前的年龄(在一维数组中)。最后，当处理完整个流时，它将从该数组中提取值并返回它。所以，这里有很多自动装箱和拆箱。

并行收集器:这使用 ReferencePipeline#forEach(Consumer)函数来累积它从映射函数中得到的年龄。同样有很多自动装箱和拆箱。

顺序映射和求和:在这里您映射您的 Stream<Person>到 IntStream .这意味着一件事是不再需要自动装箱或取消装箱；在某些情况下，这可以节省大量时间。然后它使用以下实现对结果流求和:

@Override
public final int sum() {
    return reduce(0, Integer::sum);
}

reduce这里的函数将调用 ReduceOps#ReduceOp#evaluateSequential(PipelineHelper<T> helper, Spliterator<P_IN> spliterator) .
这实质上将使用 Integer::sum对所有数字进行函数处理，从 0 和第一个数字开始，然后是第二个数字的结果，依此类推。

并行映射和求和:事情变得有趣了。它使用相同的 sum()函数，但是在这种情况下，reduce 将调用 ReduceOps#ReduceOp#evaluateParallel(PipelineHelper<T> helper, Spliterator<P_IN> spliterator)而不是顺序选项。这将基本上使用分而治之的方法来将值相加。现在，分而治之的一大优势当然是它可以轻松地并行完成。然而，它确实需要多次拆分和重新加入流，这会花费时间。因此，它的速度可能会有很大差异，具体取决于与元素有关的实际任务的复杂性。在添加的情况下，大多数情况下可能不值得；从我的结果中可以看出，它始终是较慢的方法之一。

现在，要真正了解需要多长时间，让我们做一个适当的微基准测试。我将使用 JMH使用以下基准代码:

package com.stackoverflow.user2352924;

import org.openjdk.jmh.annotations.*;

import java.util.ArrayList;
import java.util.List;
import java.util.concurrent.TimeUnit;
import java.util.stream.Collectors;

@BenchmarkMode(Mode.Throughput)
@OutputTimeUnit(TimeUnit.MINUTES)
@Warmup(iterations = 5, time = 5, timeUnit = TimeUnit.SECONDS)
@Measurement(iterations = 10, time = 10, timeUnit = TimeUnit.SECONDS)
@State(Scope.Benchmark)
@Fork(1)
@Threads(2)
public class MicroBenchmark {

    private static List<Person> persons = new ArrayList<>();

    private int test;

    static {
        for(int i=0; i < 10000000; i++){
            persons.add(new Person("random", 26));
        }
    }

    @Benchmark
    public void sequentialCollectors() {
        test = 0;
        test += persons.stream().collect(Collectors.summingInt(p -> p.getAge()));
    }

    @Benchmark
    public void parallelCollectors() {
        test = 0;
        test += persons.parallelStream().collect(Collectors.summingInt(p -> p.getAge()));
    }

    @Benchmark
    public void sequentialMapSum() {
        test = 0;
        test += persons.stream().mapToInt(p -> p.getAge()).sum();
    }

    @Benchmark
    public void parallelMapSum() {
        test = 0;
        test += persons.parallelStream().mapToInt(p -> p.getAge()).sum();
    }

}

pom.xml对于这个 Maven 项目，它看起来像这样:

<project xmlns="http://maven.apache.org/POM/4.0.0" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
         xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd">
    <modelVersion>4.0.0</modelVersion>

    <groupId>com.stackoverflow.user2352924</groupId>
    <artifactId>StackOverflow</artifactId>
    <version>1.0</version>
    <packaging>jar</packaging>

    <name>Auto-generated JMH benchmark</name>

    <prerequisites>
        <maven>3.0</maven>
    </prerequisites>

    <dependencies>
        <dependency>
            <groupId>org.openjdk.jmh</groupId>
            <artifactId>jmh-core</artifactId>
            <version>${jmh.version}</version>
        </dependency>
        <dependency>
            <groupId>org.openjdk.jmh</groupId>
            <artifactId>jmh-generator-annprocess</artifactId>
            <version>${jmh.version}</version>
            <scope>provided</scope>
        </dependency>
    </dependencies>

    <properties>
        <project.build.sourceEncoding>UTF-8</project.build.sourceEncoding>
        <jmh.version>0.9.5</jmh.version>
        <javac.target>1.8</javac.target>
        <uberjar.name>benchmarks</uberjar.name>
    </properties>

    <build>
        <plugins>
            <plugin>
                <groupId>org.apache.maven.plugins</groupId>
                <artifactId>maven-compiler-plugin</artifactId>
                <version>3.1</version>
                <configuration>
                    <compilerVersion>${javac.target}</compilerVersion>
                    <source>${javac.target}</source>
                    <target>${javac.target}</target>
                </configuration>
            </plugin>
            <plugin>
                <groupId>org.apache.maven.plugins</groupId>
                <artifactId>maven-shade-plugin</artifactId>
                <version>2.2</version>
                <executions>
                    <execution>
                        <phase>package</phase>
                        <goals>
                            <goal>shade</goal>
                        </goals>
                        <configuration>
                            <finalName>microbenchmarks</finalName>
                            <transformers>
                                <transformer implementation="org.apache.maven.plugins.shade.resource.ManifestResourceTransformer">
                                    <mainClass>org.openjdk.jmh.Main</mainClass>
                                </transformer>
                            </transformers>
                        </configuration>
                    </execution>
                </executions>
            </plugin>
        </plugins>
        <pluginManagement>
            <plugins>
                <plugin>
                    <artifactId>maven-clean-plugin</artifactId>
                    <version>2.5</version>
                </plugin>
                <plugin>
                    <artifactId>maven-deploy-plugin</artifactId>
                    <version>2.8.1</version>
                </plugin>
                <plugin>
                    <artifactId>maven-install-plugin</artifactId>
                    <version>2.5.1</version>
                </plugin>
                <plugin>
                    <artifactId>maven-jar-plugin</artifactId>
                    <version>2.4</version>
                </plugin>
                <plugin>
                    <artifactId>maven-javadoc-plugin</artifactId>
                    <version>2.9.1</version>
                </plugin>
                <plugin>
                    <artifactId>maven-resources-plugin</artifactId>
                    <version>2.6</version>
                </plugin>
                <plugin>
                    <artifactId>maven-site-plugin</artifactId>
                    <version>3.3</version>
                </plugin>
                <plugin>
                    <artifactId>maven-source-plugin</artifactId>
                    <version>2.2.1</version>
                </plugin>
                <plugin>
                    <artifactId>maven-surefire-plugin</artifactId>
                    <version>2.17</version>
                </plugin>
            </plugins>
        </pluginManagement>
    </build>

</project>

确保 Maven 也与 Java 8 一起运行，否则你会得到丑陋的错误。

我不会在这里详细介绍如何使用 JMH(还有其他地方可以这样做)，但这是我得到的结果:

# Run complete. Total time: 00:08:48

Benchmark                                     Mode  Samples     Score  Score error    Units
c.s.u.MicroBenchmark.parallelCollectors      thrpt       10  3658,949      775,115  ops/min
c.s.u.MicroBenchmark.parallelMapSum          thrpt       10  2616,905      221,109  ops/min
c.s.u.MicroBenchmark.sequentialCollectors    thrpt       10  5502,160      439,024  ops/min
c.s.u.MicroBenchmark.sequentialMapSum        thrpt       10  6120,162      609,232  ops/min

因此，在我运行这些测试时的系统上，顺序映射求和要快得多，在并行映射求和(使用分治法)仅能完成 2600 . 事实上，顺序方法都比并行方法快得多。

现在，在更容易并行运行的情况下 - 例如哪里Person#getAge() function 比 getter 复杂得多 - 并行方法很可能是更好的解决方案。归根结底，这一切都取决于在被测试的情况下并行运行的效率。

要记住的另一件事:如果有疑问，请进行适当的微基准测试。 ;-)

关于java - 我应该如何用流来总结一些东西？，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/24198561/

java - 我应该如何用流来总结一些东西？

上一篇：java - Java 中的 0.0 和 -0.0 (IEEE 754)

下一篇：java - Hibernate session.getTransaction().begin() 与 session.beginTransaction() 之间的区别