java - 重复循环大量对象时如何优化性能

我有一个简单的文件，每行包含两个整数值(源整数和目标整数)。每条线代表两个值之间的关系。该文件未排序，实际文件包含大约 400 万行。排序后可能如下所示:

sourceId;targetId   
1;5    
2;3   
4;7  
7;4  
8;7  
9;5

我的目标是创建一个新对象，该对象将表示具有唯一标识符的列表中的所有唯一相关整数。此示例的预期输出应为以下三个对象:

0, [1, 5, 9]  
1, [2, 3]  
2, [4, 7, 8]

因此 groupId 0 包含一组关系(1、5 和 9)。

下面是我当前创建这些对象列表的方法。 Relation 对象列表包含内存中的所有行。 GroupedRelation 列表应该是最终结果。

public class GroupedRelationBuilder {

    private List<Relation> relations;
    private List<GroupedRelation> groupedRelations;
    private List<String> ids;
    private int frameId;

    public void build() {
        relations = new ArrayList<>();
        relations.add(new Relation(1, 5));
        relations.add(new Relation(4, 7));
        relations.add(new Relation(8, 7));
        relations.add(new Relation(7, 4));
        relations.add(new Relation(9, 5));
        relations.add(new Relation(2, 3));

        // sort
        relations.sort(Comparator.comparing(Relation::getSource).thenComparing(Relation::getTarget));

        // build the groupedRelations
        groupId = 0;
        groupedRelations = new ArrayList<>();
        for (int i = 0; relations.size() > 0;) {
            ids = new ArrayList<>();
            int compareSource = relations.get(i).getSource();
            int compareTarget = relations.get(i).getTarget();
            ids.add(Integer.toString(compareSource));
            ids.add(Integer.toString(compareTarget));               
            relations.remove(i);
            for (int j = 0; j < relations.size(); j++) {
                int source = relations.get(j).getSource();
                int target = relations.get(j).getTarget();
                if ((source == compareSource || source == compareTarget) && !ids.contains(Integer.toString(target))) {
                    ids.add(Integer.toString(target));                      
                    relations.remove(j);
                    continue;
                }
                if ((target == compareSource || target == compareTarget) && !ids.contains(Integer.toString(source))) {
                    ids.add(Integer.toString(source));                      
                    relations.remove(j);
                    continue;
                }
            }
            if (relations.size() > 0) {
                groupedRelations.add(new GroupedRelation(groupId++, ids));
            }
        }
    }

    class GroupedRelation {
        private int groupId;
        private List<String> relatedIds;

        public GroupedRelation(int groupId, List<String> relations) {
            this.groupId = groupId;
            this.relatedIds = relations;
        }

        public int getGroupId() {
            return groupId;
        }

        public List<String> getRelatedIds() {
            return relatedIds;
        }
    }

    class Relation {
        private int source;
        private int target;

        public Relation(int source, int target) {
            this.source = source;
            this.target = target;
        }

        public int getSource() {
            return source;
        }

        public void setSource(int source) {
            this.source = source;
        }

        public int getTarget() {
            return target;
        }

        public void setTarget(int target) {
            this.target = target;
        }
    }
}

当我运行这个小示例程序时，创建 1000 个 GroupedRelation 对象需要 15 秒。创建 100 万个 GroupedRelation 需要 250 分钟..

我正在寻求帮助来优化我的代码，该代码确实能得到我想要的结果，但需要很长时间。

是否可以优化迭代，使预期结果相同，但获得预期结果所需的时间显着减少？如果这是可能的，你会怎么做？

最佳答案

由于ids.contains，当前的实现速度很慢步。 ArrayList.contains的时间复杂度方法是O(n) : 为了检查它是否包含一个元素，它会逐一检查元素，在最坏的情况下扫描整个列表。

更改 ids 的类型可以大大提高性能来自List<String>至Set<String> ，并使用HashSet<String>实例。预期时间复杂度为Set.contains实现是O(1) , 与列表相比要快得多。

关于java - 重复循环大量对象时如何优化性能，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/38292491/

java - 重复循环大量对象时如何优化性能

上一篇：java - 在 Java 中从 ArrayList 中删除所有项目的最佳实践

下一篇：java - 如何强制我的子类构造函数不调用基类构造函数？