我需要使用自定义比较器对列表进行排序:Collections.sort(availableItems, new TextClassifyCnnComparator(citem, false))
class TextClassifyCnnComparator implements Comparator<Item> {
private Item citem;
private boolean isAsc;
public TextClassifyCnnComparator(Item citem) {
this(citem, true);
}
public TextClassifyCnnComparator(Item citem, boolean isAsc) {
this.citem = citem;
this.isAsc = isAsc;
}
private Double calcSimilarScore(Item item) {
return item.getEncodedFromCNN().dotProduct(citem.getEncodedFromCNN());
}
@Override
public int compare(Item o1, Item o2) {
if (isAsc) {
return calcSimilarScore(o1).compareTo(calcSimilarScore(o2));
}
return calcSimilarScore(o2).compareTo(calcSimilarScore(o1));
}
}
Java 是否会为每个项目映射并调用 calcSimilarScore
1 次,还是会被调用多次(每个元组 2 项目 1 次)?
如果它调用多次,我该如何优化这个任务?
========更新1:===============
我已经折射了我的比较器:
class TextClassifyCnnComparator implements Comparator<Integer> {
private boolean isAsc;
private List<Double> list;
public TextClassifyCnnComparator(Item citem, List<Item> list) {
this(citem, list, true);
}
public TextClassifyCnnComparator(Item citem, List<Item> list, boolean isAsc) {
this.list = list.parallelStream().map(item -> calcSimilarScore(item, citem)).collect(Collectors.toList());
this.isAsc = isAsc;
}
private Double calcSimilarScore(Item item1, Item item2) {
return item1.getEncodedFromCNN().dotProduct(item2.getEncodedFromCNN());
}
public List<Integer> createIndexes() {
List<Integer> indexes = new ArrayList<>();
for (int i = 0; i < list.size(); i++) {
indexes.add(i); // Autoboxing
}
return indexes;
}
@Override
public int compare(Integer index1, Integer index2) {
// Autounbox from Integer to int to use as array indexes
if (isAsc)
return list.get(index1).compareTo(list.get(index2));
return list.get(index2).compareTo(list.get(index1));
}
}
...
TextClassifyCnnComparator comparator = new TextClassifyCnnComparator(citem, availableItems);
List<Integer> indexes = comparator.createIndexes();
Collections.sort(indexes, comparator);
return indexes.parallelStream().map(index -> availableItems.get(index)).collect(Collectors.toList());
我认为它还可以进一步优化。
最佳答案
有以下优化:
- 只要可行,就应该使用
double
(“原始”数据类型)而不是Double
(持有 double 的对象包装类)。 - 比较的
citem
的一部分可以在构造函数中预先计算。 (citem
甚至可能不再需要作为字段。) - 一个值可能会被比较多次,因此可以使用缓存,即从 Item 到其 double 值的 Map。
所以
class TextClassifyCnnComparator implements Comparator<Item> {
private final Item citem;
private final boolean isAsc;
private final ECNN encodedFromCNN;
private Map<Item, Double> scores = new HashMap<>();
public TextClassifyCnnComparator(Item citem) {
this(citem, true);
}
public TextClassifyCnnComparator(Item citem, boolean isAsc) {
this.citem = citem;
this.isAsc = isAsc;
encodedFromCNN = citem.getEncodedFromCNN();
}
private double calcSimilarScore(Item item) {
Double cached = scores.get(item);
if (cached != null) {
return cached;
}
double score = item.getEncodedFromCNN().dotProduct(encodedFromCNN);
scores.put(Item, score);
return score;
}
@Override
public int compare(Item o1, Item o2) {
if (isAsc) {
return calcSimilarScore(o1).compareTo(calcSimilarScore(o2));
}
return calcSimilarScore(o2).compareTo(calcSimilarScore(o1));
}
}
或者 Java 8 中的时尚:
private double calcSimilarScore(Item item) {
return scores.computeIfAbsent(item,
it -> it.getEncodedFromCNN().dotProduct(encodedFromCNN));
}
关于Java Comparator会计算1次或多次分数,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/47238852/