java - 如何使用 Java 8 流和 lambda 进行并行唯一字数统计?

标签 java lambda mapreduce word-count java-stream

使用 Java 8 流和 lambda 进行并行唯一字数统计的最佳方法是什么?

我想出了几个,但我不相信它们是最佳的。 我知道 Hadoop 上的 map reduce 解决方案,想知道它们是否提供相同类型的并行性。

// Map Reduce Word Count 

Map<String, Integer> wordCount = Stream.of("dog","cat","dog","dog","cow","house","house").parallel().collect( Collectors.groupingBy(e->e,Collectors.summingInt(e -> 1)));
System.out.println("number of dogs = " + wordCount.get("dog"));

Map<Object, Object> wordCount2 = Stream.of("dog","cat","dog","dog","cow","house","house").parallel().collect(Collectors.toConcurrentMap(keyWord->keyWord, keyWord->1, (oldVal,newVal)->(int)oldVal+(int)newVal));
System.out.println("number of dogs = " + wordCount2.get("dog"));

假设真实列表会更长,可能来自文件或生成的流,并且我想知道所有单词的计数,而不仅仅是狗。

最佳答案

查看 Collectors.groupingBy 的 javadocs

@implNoteThe returned Collector is not concurrent. For parallel stream pipelines, the combiner function operates by merging the keys from one map into another, which can be an expensive operation. If preservation of the order in which elements are presented to the downstream collector is not required, using groupingByConcurrent(Function, Supplier, Collector) may offer better parallel performance.

现在,查看 Collectors.groupingByConcurrent,您会发现这或多或少等同于您的第二种方法

Returns a concurrent Collector implementing a cascaded "group by" operation on input elements of type T, grouping elements according to a classification function, and then performing a reduction operation on the values associated with a given key using the specified downstream Collector. The ConcurrentMap produced by the Collector is created with the supplied factory function.

关于java - 如何使用 Java 8 流和 lambda 进行并行唯一字数统计?,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/26002383/

相关文章:

java - hadoop java : how to know that end of reducer input is reached?

hadoop - RDBMS 与 MAP REDUCE

java - 在 Java 中使用 Duration 对象将天、分钟、小时转换为秒失败

java - Eclipse 插件,用于自动创建 Java 代码以重现对象的状态,以便在调试时进行快速测试

java - Tomcat 上的 SpringBoot : Error creating bean with name 'jacksonObjectMapperBuilder'

c++ - Lambda 表达式、共享指针和 this 的类型

java - Quartz Scheduler +HSQLDB 巨大的 .lobs 文件。尺寸永远增加

Java 8 lambda 添加到 Hashmap 中的列表,在 Hashmap 中

c# - Linq 表达式如何确定相等性?

mapreduce - 组合器和 reducer 可以不同吗?