java - 同时搜索多个HashMap

tldr:如何同时在多个(只读)Java HashMap 中搜索条目？

长版:

我有几个不同大小的字典存储为 HashMap< String, String > .一旦读入，就永远不能更改(严格只读)。我想检查是否以及哪个字典存储了带有我的 key 的条目。

我的代码最初是在寻找这样的 key :

public DictionaryEntry getEntry(String key) {
    for (int i = 0; i < _numDictionaries; i++) {
        HashMap<String, String> map = getDictionary(i);
        if (map.containsKey(key))
             return new DictionaryEntry(map.get(key), i);
    }
    return null;
}

然后它变得有点复杂:我的搜索字符串可能包含拼写错误，或者是存储条目的变体。比如，如果存储的键是“banana”，我可能会查找“bannana”或“a banana”，但仍然希望返回“banana”的条目。使用 Levenshtein-Distance，我现在循环遍历所有词典和其中的每个条目:

public DictionaryEntry getEntry(String key) {
    for (int i = 0; i < _numDictionaries; i++) {
        HashMap<String, String> map = getDictionary(i);
        for (Map.Entry entry : map.entrySet) {
            // Calculate Levenshtein distance, store closest match etc.
        }
    }
    // return closest match or null.
}

到目前为止，一切正常，我得到了我想要的条目。不幸的是，我必须在五个不同大小的词典中查找大约 7000 个字符串(约 30 - 70k 个条目)，这需要一段时间。从我的处理输出来看，我有一个强烈的印象，我的查找主导了整个运行时间。

我改进运行时的第一个想法是并行搜索所有词典。由于不会更改任何词典，并且不会有超过一个线程同时访问一本词典，因此我看不到任何安全问题。

问题只是:我该怎么做？我以前从未使用过多线程。我的搜索只出现了 Concurrent HashMaps(但据我所知，我不需要这个)和 Runnable 类，我必须将我的处理放入方法 run() 中.我想我可以重写我当前的类以适应 Runnable，但我想知道是否有更简单的方法来做到这一点(或者我怎样才能用 Runnable 简单地做到这一点，现在我有限的理解认为我必须重组很多).

自从我被要求分享 Levenshtein-Logic:它真的没什么特别的，但是给你:

private int _maxLSDistance = 10;
public Map.Entry getClosestMatch(String key) {
    Map.Entry _closestMatch = null;
    int lsDist;

    if (key == null) {
        return null;
    }

    for (Map.Entry entry : _dictionary.entrySet()) {
        // Perfect match
        if (entry.getKey().equals(key)) {
            return entry;
        }
        // Similar match
        else {
            int dist = StringUtils.getLevenshteinDistance((String) entry.getKey(), key);

            // If "dist" is smaller than threshold and smaller than distance of already stored entry
            if (dist < _maxLSDistance) {
                if (_closestMatch == null || dist < _lsDistance) {
                    _closestMatch = entry;
                    _lsDistance = dist;
                }
            }
        }
    }
    return _closestMatch
}

最佳答案

为了在您的情况下使用多线程，可能是这样的:

“监视器”类，主要存储结果并协调线程；

public class Results {

    private int nrOfDictionaries = 4; //

    private ArrayList<String> results = new ArrayList<String>();

    public void prepare() {
        nrOfDictionaries = 4;
        results = new ArrayList<String>();
    }

    public synchronized void oneDictionaryFinished() {
        nrOfDictionaries--;
        System.out.println("one dictionary finished");
        notifyAll();
    }

    public synchronized boolean isReady() throws InterruptedException {

        while (nrOfDictionaries != 0) {
            wait();
        }

        return true;
    }

    public synchronized void addResult(String result) {
        results.add(result);
    }

    public ArrayList<String> getAllResults() {
        return results;
    }
}

Thread是自己的，可以设置为搜索特定的字典:

public class ThreadDictionarySearch extends Thread {

    // the actual dictionary
    private String dictionary;
    private Results results;

    public ThreadDictionarySearch(Results results, String dictionary) {
        this.dictionary = dictionary;
        this.results = results;
    }

    @Override
    public void run() {

        for (int i = 0; i < 4; i++) {
            // search dictionary;
            results.addResult("result of " + dictionary);
            System.out.println("adding result from " + dictionary);
        }

        results.oneDictionaryFinished();
    }

}

以及演示的主要方法:

public static void main(String[] args) throws Exception {

    Results results = new Results();

    ThreadDictionarySearch threadA = new ThreadDictionarySearch(results, "dictionary A");
    ThreadDictionarySearch threadB = new ThreadDictionarySearch(results, "dictionary B");
    ThreadDictionarySearch threadC = new ThreadDictionarySearch(results, "dictionary C");
    ThreadDictionarySearch threadD = new ThreadDictionarySearch(results, "dictionary D");

    threadA.start();
    threadB.start();
    threadC.start();
    threadD.start();

    if (results.isReady())
    // it stays here until all dictionaries are searched
    // because in "Results" it's told to wait() while not finished;

for (String string : results.getAllResults()) {
        System.out.println("RESULT: " + string);
    }

关于java - 同时搜索多个HashMap，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/31722393/

java - 同时搜索多个HashMap

上一篇：java - 如何在 Rest 服务中发送带有 Jersey 的 JSON 对象

下一篇：java - 比较两个不同的音频文件不起作用