java - Java中的大量对象(使用HashMap)

你好，

我目前正在使用 Java 进行单词预测。为此，我使用了基于 NGram 的模型，但我遇到了一些内存问题...

我第一次有这样的模型:

public class NGram implements Serializable {
    private static final long serialVersionUID = 1L;

    private transient int count;
    private int id;
    private NGram next;

    public NGram(int idP) {
        this.id = idP;
    }
}

但它占用大量内存，所以我认为我需要优化，我想，如果我有“hello the world”和“hello the people”，而不是得到两个 ngram，我可以保留一个“你好”然后有两种可能:“人”和“世界”。

更清楚地说，这是我的新模型:

public class BNGram implements Serializable {
    private static final long serialVersionUID = 1L;
    private int id;
    private HashMap<Integer,BNGram> next;
    private int count = 1;

    public BNGram(int idP) {
        this.id = idP;
        this.next = new HashMap<Integer, BNGram>();
    }
}

但似乎我的第二个模型消耗了两倍的内存......我认为这是因为 HashMap，但我不知道如何减少它？我尝试使用不同的 Map 实现，如 Trove 或其他，但它没有改变任何东西。

给你一个想法，对于一个 9MB 的文本，有 57818 个不同的单词(不同，但它不是单词的总数)，在生成 NGram 之后，我的 javaw 进程消耗了 1.2GB 的内存...... 如果我用 GZIPOutputStream 保存它，它会占用大约 18MB 的磁盘空间。

所以我的问题是:我怎样才能使用更少的内存？我可以做一些压缩的东西吗(如序列化)。我需要将它添加到其他应用程序，所以我需要在...之前减少内存使用量

非常感谢，抱歉我的英语不好......

ZiMath

最佳答案

你需要一个专门的结构来实现你想要的。

看看Apache's PatriciaTrie .它就像一个 Map，但它是内存明智的并且与 String 一起使用。它也非常快:操作是 O(k)，其中 k 是最大 key 的位数。

它有一个适合您即时需要的操作:prefixMap() ，它返回一个 SortedMap View 的 trie，其中包含 String，其前缀为给定的键。

一个简短的用法示例:

public class Patricia {

    public static void main(String[] args) {

        PatriciaTrie<String> trie = new PatriciaTrie<>();

        String world = "hello the world";
        String people = "hello the people";

        trie.put(world, null);
        trie.put(people, null);

        SortedMap<String, String> map1 = trie.prefixMap("hello");
        System.out.println(map1.keySet());  // [hello the people, hello the world]

        SortedMap<String, String> map2 = trie.prefixMap("hello the w");
        System.out.println(map2.keySet()); // [hello the world]

        SortedMap<String, String> map3 = trie.prefixMap("hello the p");
        System.out.println(map3.keySet());  // [hello the people]
    }
}

还有the tests ，其中包含更多示例。

关于java - Java中的大量对象(使用HashMap)，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/29182538/

java - Java中的大量对象(使用HashMap)

上一篇：java - 在 Java 中将 ByteBuffer 转换为 String

下一篇：java - 如何在 Java 中实现集合数据结构？