java - python vs java 用于存储字典的内存

我想构建一个包含字典的列表/数组。每个字典都包含一个整数作为键和一个(可能很长)int 数组。我已使用 numpy 在 python 上实现了此功能，如下所示:

def get_dicts(dict_names):

    dictionaries = [None]*len(dict_names)
    k = 0
    my_dict = {}
    for i in dict_names:
        local_dict = my_dict.copy()
        with open(i, 'rt') as f:
            for line in f:
                v = np.fromstring(line, dtype=int, sep=' ')
                local_dict[v[0]] = v[1:]

        dictionaries[k] = local_dict
        k += 1
        print "Dictionary %s extracted" % i
    return dictionaries

def main():
     dict_names = [str(i) + "_tweet_mapping" for i in range(1, 45)]
     dictionaries = get_dicts(dict_names)

运行时间还可以:90秒。然而，后来在我的问题中，python 太慢了，我将所有内容都移植到 java 中。在 java 中，在 HashMaps 的 ListArray 中构建这些字典会占用大量内存，甚至会出现堆 的问题。运行时间也慢得多。我的java实现如下:

private ArrayList<Hashtable<Integer, Integer[]>> get_dicts(String [] dictionary_files) {

    ArrayList<Hashtable<Integer, Integer []>>  my_dictionaries = new ArrayList<Hashtable<Integer,Integer []>>(dictionary_files.length);
    for (int i=0; i<dictionary_files.length; i++) {
         my_dictionaries.add(get_one_dict(dictionary_files[i]));
    }
    return my_dictionaries;

}

private Hashtable<Integer, Integer []> get_one_dict(String dictionary_file){

    Hashtable<Integer, Integer []> my_dictionary = new Hashtable<Integer, Integer[]>();
    try{
        BufferedReader br = new BufferedReader(new FileReader(dictionary_file));
        try{
            String s;
            while((s = br.readLine()) != null){
                String [] words = s.split(" ");
                int n_tweets = words.length-1;
                Integer [] int_line = new Integer[n_tweets];
                int key_word = Integer.parseInt(words[0]);
                for (int j=0; j<n_tweets; j++){
                    int_line[j] = Integer.parseInt(words[j+1]);

                }

                my_dictionary.put(key_word, int_line);

            }
        }finally{
            br.close();
        }
    } catch(IOException e){
        e.printStackTrace();
    }catch(OutOfMemoryError e){
        e.printStackTrace();
    }catch(Exception e){
        e.printStackTrace();
    }
    System.out.println("Dictionary " + dictionary_file +" extracted");
    return my_dictionary;
}

为什么在时间和内存方面的性能存在如此巨大的差异。我可以做些什么来减少java中的内存消耗？

最佳答案

您正在使用包装类型 Integer而不是int 。对于 map 键，您别无选择，但对于数组，您可以选择。

使用Map<Integer, int[]>会将每个元素的内存消耗从 4 + 16 字节减少到 4 字节。(*)

<小时/>

你还应该忘记Hashtable并使用HashMap反而。前者是同步的，您不需要。但这应该不是什么大问题。

我猜速度减慢主要是由于不必要的内存分配。

<小时/>

(*) 4(或 64 位 JVM 上没有压缩 OOPS 的 8)用于引用，16 用于对象(这是最小大小)。

关于java - python vs java 用于存储字典的内存，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/27158637/

java - python vs java 用于存储字典的内存

上一篇：java - 如何覆盖所有导入的 Java 库的 log4j 属性？

下一篇：java - 限制点后的字符 - java