Java并行序列化和压缩

标签 java performance serialization parallel-processing

我有一个 JavaSE8 应用程序,用于并行处理大型数据集。我正在生成 1M 个对象,我想将其序列化为单个压缩文件。该文件将从网络应用程序下载/上传。 并行过程得到了很好的优化。然而,序列化/压缩是按顺序完成的,这是我的应用程序的瓶颈。

我测试了不同的解决方案:Kryo、ChronicleMap...我现在使用 Kryo 和 Bz2 压缩。它正在发挥作用。但性能还不够好。

我找不到任何进行并行序列化和压缩的解决方案。欢迎提供这方面的任何信息

最佳答案

实际上,如何并行或顺序处理数据集并不重要,因为在清晰的设计中 - 序列化始终是顺序操作(由于输出流、套接字等的顺序性质)操作并保留数据集加工。因此,如果您要序列化并将序列化的数据集放入文件、连接或原始内存中,您必须定义一个屏障,以保护数据免受并发竞争和意外修改的影响。

当然,在某些情况下,每个工作线程都会自行序列化数据,例如http服务器工作,但这里我们讨论的是并行处理并最终序列化的单个数据集。

所以,根据上面的说法,它应该是正确的答案代码。它使用标准的Java序列化+GZIP压缩。您可以轻松地替换此代码中的序列化和/或压缩,并针对您当前的解决方案进行基准测试。

package com.example.demo;

import java.io.*;
import java.util.ArrayList;
import java.util.Arrays;
import java.util.List;
import java.util.zip.GZIPInputStream;
import java.util.zip.GZIPOutputStream;

import static java.lang.String.format;

public final class ParallelObjectsSerialization {

    private static final int ONE_MILLION = 1_000_000;
    private static final String SERIALIZE_FILE = "/tmp/out.bin";

    public static void main(String[] args) throws IOException, ClassNotFoundException {
//        List<Player> players = parallelGenerate1MPlayers();
        List<Player> players = seqGenerate1MPlayers();
        serialize(players);
        players.clear();
        players = deserialize();
    }

    private static List<Player> deserialize() throws IOException, ClassNotFoundException {
        long started = System.currentTimeMillis();
        List<Player> players = new ArrayList<>();
        try (ObjectInputStream in = new ObjectInputStream(new GZIPInputStream(new FileInputStream(SERIALIZE_FILE)))) {
            for (int i = 0; i < ONE_MILLION; i++) {
                players.add((Player) in.readObject());
            }
        }
        long time = System.currentTimeMillis() - started;
        System.out.println(format("deserialization of %d objects took %d ms", players.size(), time));
        return players;
    }

    private static final class Player implements Serializable {
        private final String name;
        private final int level;

        private Player(String name, int level) {
            this.name = name;
            this.level = level;
        }
    }

    private static List<Player> seqGenerate1MPlayers() {
        long started = System.currentTimeMillis();
        List<Player> players = new ArrayList<>(ONE_MILLION);
        for (int i = 0; i < ONE_MILLION; i++) {
            players.add(new Player(randomName(i), i));
        }
        long time = System.currentTimeMillis() - started;
        System.out.println(format("sequential generating of %d objects took %d ms", players.size(), time));
        return players;
    }

    private static List<Player> parallelGenerate1MPlayers() {
        long started = System.currentTimeMillis();
        Player[] players = new Player[ONE_MILLION];
        Arrays.parallelSetAll(players, (i) -> new Player(randomName(i), i));
        long time = System.currentTimeMillis() - started;
        System.out.println(format("parallel generating of %d objects took %d ms", players.length, time));
        return Arrays.asList(players);
    }

    private static void serialize(List<Player> players) throws IOException {
        long started = System.currentTimeMillis();
        try (ObjectOutputStream out = new ObjectOutputStream(new GZIPOutputStream(new FileOutputStream(SERIALIZE_FILE)))) {
            for (Player player : players) {
                out.writeObject(player);
            }
        }
        long time = System.currentTimeMillis() - started;
        System.out.println(format("serialization of %d objects took %d ms", players.size(), time));
    }

    private static String randomName(int seed) {
        StringBuilder builder = new StringBuilder();
        double chance = 30.0;
        for (char c = 'a'; c <= 'z'; c++) {
            if (Math.random() * 100.0 <= chance) {
                builder.append(c);
                if (builder.length() == 7) {
                    break;
                }
            }
        }
        if (builder.length() == 0) {
            builder.append("unknown").append(seed);
        }
        return builder.toString();
    }
}

关于Java并行序列化和压缩,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/47951886/

相关文章:

java - 是否有等效的 super 用于将方法调用重定向到包装类中的内部字段

MySQL 慢查询错误 : Can't determine basedir from 'my_pri

html - Canvas 渐变性能

c# - 如何将项目 { a_0, .., a_N } 合并到 K 组中,以便合并过程成本最低?

php - 存储未知长度的数组

c# - 如何调试 XML 反序列化?

java - 是否可以在 jsp 页面上的复选框或单选按钮中显示 boolean 值?

java - 如何在 hibernate 条件上添加 "on"子句?

java - getDeclaredFields 具有实际运行时类型,适用于 Unsafe.objectFieldOffset(f)

c# - JavaScriptSerializer 类是否已弃用?