Java:通过多个字段过滤集合和检索数据

标签 java algorithm collections data-retrieval

我有一个类:

public class Address {
    private String country;
    private String state;
    private String city;
}

还有一个 Person 对象列表。人类看起来像:

public class Person {
    private String country;
    private String state;
    private String city;
    //other fields
}

我需要过滤 Person 对象并获得最合适的对象。 Address 对象可以有至少一个非空字段。 Person 对象可以没有、部分或全部初始化提到的字段。

这是可能的输入示例之一:

Three Person objects:
a. PersonA: country = 'A'
b. PersonB: country = 'A', state = 'B'
c. PersonC: country = 'A', state = 'B', city = 'C'

Address object:
a. Address: country = 'A', state = 'B'

过滤后的预期结果是PersonB。如果只有 PersonA 和 PersonC 对象,那么 PersonA 更可取。

我想展示一下我是如何尝试做到这一点的,但实际上它是纯粹的蛮力算法,我不喜欢它。算法复杂度随着新增字段的增加而增加。我也考虑过通过谓词使用 Guava 过滤器,但不知道谓词应该是什么。

如果除了蛮力之外,这种过滤的首选算法是什么?

最佳答案

为避免暴力破解,您需要按地址为您的人员编制索引。为了进行良好的搜索,您肯定需要一个国家(猜测它或以某种方式默认它,否则结果无论如何都会太不准确)。

索引将是一个数字,前 3 位数字代表国家,后 3 位数字代表州,最后 4 位数字代表城市。在这种情况下,您可以在 int 中存储 213 个国家 ( only 206 as of 2016 ),最多包含 999 个州和 9999 个城市。

它使我们能够使用 hashCode 和 TreeSet 来索引您的 Person 实例,并以 O(log(n)) 方式按地址部分查找它们,而无需触及它们的字段。在 TreeSet 构造中会触及字段,您需要添加一些额外的逻辑来修改 Person 以保持索引完整。

指数从国家开始,按每个部分顺序计算

    import java.util.HashMap;
    import java.util.Map;

    public class PartialAddressSearch {

        private final static Map<String, AddressPartHolder> COUNTRY_MAP = new HashMap<>(200);

        private static class AddressPartHolder {
            int id;
            Map<String, AddressPartHolder> subPartMap;

            public AddressPartHolder(int id, Map<String, AddressPartHolder> subPartMap) {
                this.id = id;
                this.subPartMap = subPartMap;
            }
        }

        public static int getCountryStateCityHashCode(String country, String state, String city) {
            if (country != null && country.length() != 0) {
                int result = 0;
                AddressPartHolder countryHolder = COUNTRY_MAP.get(country);
                if (countryHolder == null) {
                    countryHolder = new AddressPartHolder(COUNTRY_MAP.size() + 1, new HashMap<>());
                    COUNTRY_MAP.put(country, countryHolder);
                }
                result += countryHolder.id * 10000000;

                if (state != null) {
                    AddressPartHolder stateHolder = countryHolder.subPartMap.get(state);
                    if (stateHolder == null) {
                        stateHolder = new AddressPartHolder(countryHolder.subPartMap.size() + 1, new HashMap<>());
                        countryHolder.subPartMap.put(state, stateHolder);
                    }
                    result += stateHolder.id * 10000;

                    if (city != null && city.length() != 0) {
                        AddressPartHolder cityHolder = stateHolder.subPartMap.get(city);
                        if (cityHolder == null) {
                            cityHolder = new AddressPartHolder(stateHolder.subPartMap.size() + 1, null);
                            stateHolder.subPartMap.put(city, cityHolder);
                        }
                        result += cityHolder.id;
                    }
                }

                return result;
            } else {
                throw new IllegalArgumentException("Non-empty country is expected");
            }
    }

对于您的 Person 和 Address 类,您可以根据 int 的自然顺序定义 hashCode 和 compareTo:

    public class Person implements Comparable {
        private String country;
        private String state;
        private String city;

        @Override
        public boolean equals(Object o) {
             //it's important but I removed it for readability
        }

        @Override
        public int hashCode() {
            return getCountryStateCityHashCode(country, state, city);
        }

        @Override
        public int compareTo(Object o) {
            //could be further improved by storing hashcode in a field to avoid re-calculation on sorting
            return hashCode() - o.hashCode();
        }

    }

    public class Address implements Comparable {
        private String country;
        private String state;
        private String city;


        @Override
        public boolean equals(Object o) {
             //removed for readability
        }

        @Override
        public int hashCode() {
            return getCountryStateCityHashCode(country, state, city);
        }

        @Override
        public int compareTo(Object o) {
            //could be further improved by storing hashcode in a field to avoid re-calculation on sorting
            return hashCode() - o.hashCode();
        }

    }

    public class AddressPersonAdapter extends Person {
        private final Address delegate;

        public AddressPersonAdapter(Address delegate) {
            this.delegate = delegate;
        }

        @Override
        public boolean equals(Object o) {
            return delegate.equals(o);
        }

        @Override
        public int hashCode() {
            return delegate.hashCode();
        }
    }

之后,您的过滤代码将缩小为填充索引并计算您的部分地址的下限:

    TreeSet<Person> personSetByAddress = new TreeSet<>();
    Person personA = new Person();
    personA.setCountry("A");
    personSetByAddress.add(personA);
    Person personB = new Person();
    personB.setCountry("A");
    personB.setState("B");
    personSetByAddress.add(personB);
    Person personC = new Person();
    personC.setCountry("A");
    personC.setState("B");
    personC.setCity("C");
    personSetByAddress.add(personC);

    Address addressAB = new Address();
    addressAB.setCountry("A");
    addressAB.setState("B");

    System.out.println(personSetByAddress.floor(new AddressPersonAdapter(addressAB)));

    Yields:
    Person{hashCode=10010000, country='A', state='B', city='null'}

如果你没有 PersonB:

    TreeSet<Person> personSetByAddress = new TreeSet<>();
    Person personA = new Person();
    personA.setCountry("A");
    personSetByAddress.add(personA);
    Person personC = new Person();
    personC.setCountry("A");
    personC.setState("B");
    personC.setCity("C");
    personSetByAddress.add(personC);

    Address addressAB = new Address();
    addressAB.setCountry("A");
    addressAB.setState("B");

    System.out.println(personSetByAddress.floor(new AddressPersonAdapter(addressAB)));

    Yields:
    Person{hashCode=10000000, country='A', state='null', city='null'}

编辑:

需要额外验证的极端情况是同一国家/地区内没有更大(或更小,如果我们需要上限)元素。例如:

    TreeSet<Person> personSetByAddress = new TreeSet<>();
    Person personA = new Person();
    personA.setCountry("D");
    personSetByAddress.add(personA);
    Person personC = new Person();
    personC.setCountry("A");
    personC.setState("B");
    personC.setCity("C");
    personSetByAddress.add(personC);

    Address addressAB = new Address();
    addressAB.setCountry("A");
    addressAB.setState("B");

    System.out.println(personSetByAddress.floor(new AddressPersonAdapter(addressAB)));

    Yields:
    Person{hashCode=10000000, country='D', state='null', city='null'}

即我们吵架到最近的国家。要解决这个问题,我们需要检查国家数字是否仍然相同。我们可以通过对 TreeSet 进行子类化并在其中添加此检查来实现:

 //we need this class to allow flooring just by id
 public class IntegerPersonAdapter extends Person {
    private Integer id;
    public IntegerPersonAdapter(Integer id) {
        this.id = id;
    }

    @Override
    public boolean equals(Object o) {
        return id.equals(o);
    }

    @Override
    public int hashCode() {
        return id.hashCode();
    }

    @Override
    public int compareTo(Object o) {
        return id.hashCode() - o.hashCode();
    }

    @Override
    public String toString() {
        return id.toString();
    }
}

public class StrictCountryTreeSet extends TreeSet<Person> {

    @Override
    public Person floor(Person e) {
        Person candidate = super.floor(e);
        if (candidate != null) {
            //we check if the country is the same
            int candidateCode = candidate.hashCode();
            int eCode = e.hashCode();
            if (candidateCode == eCode) {
                return candidate;
            } else {
                int countryCandidate = candidateCode / 10000000;
                if (countryCandidate == (eCode / 10000000)) {
                    //we check if the state is the same
                    int stateCandidate = candidateCode / 10000;
                    if (stateCandidate == (eCode / 10000)) {
                        //we check if is a state
                        if (candidateCode % 10 == 0) {
                            return candidate;
                        } else { //since it's not exact match we haven't found a city - we need to get someone just from state
                            return this.floor(new IntegerPersonAdapter(stateCandidate * 10000));
                        }

                    } else if (stateCandidate % 10 == 0) { //we check if it's a country already
                        return candidate;
                    } else {
                        return this.floor(new IntegerPersonAdapter(countryCandidate * 10000000));
                    }
                }
            }
        }
        return null;
    }

现在我们的测试用例会在初始化 StrictCountryTreeSet 后产生 null:

    TreeSet<Person> personSetByAddress = new StrictCountryTreeSet();
    Person personA = new Person();
    personA.setCountry("D");
    personSetByAddress.add(personA);
    Person personC = new Person();
    personC.setCountry("A");
    personC.setState("B");
    personC.setCity("C");
    personSetByAddress.add(personC);

    Address addressAB = new Address();
    addressAB.setCountry("A");
    addressAB.setState("B");

    System.out.println(personSetByAddress.floor(new AddressPersonAdapter(addressAB)));

    Yields:
    null

并且对不同状态的测试也会产生 null:

    TreeSet<Person> personSetByAddress = new StrictCountryTreeSet();
    Person personD = new Person();
    personD.setCountry("D");
    personSetByAddress.add(personD);

    Person personE = new Person();
    personE.setCountry("A");
    personE.setState("E");
    personSetByAddress.add(personE);

    Person personC = new Person();
    personC.setCountry("A");
    personC.setState("B");
    personC.setCity("C");
    personSetByAddress.add(personC);

    Address addressA = new Address();
    addressA.setCountry("A");

    Address addressAB = new Address();
    addressAB.setCountry("A");
    addressAB.setState("B");

    Address addressABC = new Address();
    addressABC.setCountry("A");
    addressABC.setState("B");
    addressABC.setCity("C");

    System.out.println(personSetByAddress.floor(new AddressPersonAdapter(addressAB)));

    Yields:
    null

请注意,在这种情况下,您需要将 hashCode 结果存储在 Address 和 Person 类中以避免重新计算。

关于Java:通过多个字段过滤集合和检索数据,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/39726165/

相关文章:

java - 如何在不出现 DOMException 的情况下将节点复制到不同的文档?

java - 反序列化通用类 Jackson 或 Gson

java - Java中对象的hashcode是如何生成的?

java - 我的快速排序算法需要的主元开关数量与我的作业中的不同的原因可能是什么?

r - 在 R 中使用 k-NN(类包)的最近邻索引

java - 如何过滤 Collection 以获得两个不同类型的 List

java - 通过构造函数传递实例或使用静态访问它?

java - 给定一个旋转排序数组,如何找到该数组中的最大值?

java - 迭代和复制HashMap值的有效方法

javascript - 获取与迭代器函数匹配的集合的第一个元素