java - 为什么在使用多个线程计算大文件的词频时答案会有所不同?

标签 java multithreading synchronized

我的目标是在使用多个线程读取大文件时计算每个单词的频率。
我正在实现 Runnable 接口(interface)以实现多线程。但是在执行程序时,我每次都没有得到正确的答案。有时,它给出了正确的输出,有时却没有。但是使用Callable接口(interface)而不是Runnable,程序可以正确执行,没有任何错误。
这是主要类(class):

import java.io.BufferedReader;
import java.io.FileReader;
import java.io.IOException;
import java.util.ArrayList;
import java.util.HashMap;
import java.util.List;
import java.util.Map;

public class WordFrequencyRunnableTest {

    public static void main(String[] args) throws IOException {
        long startTime = System.currentTimeMillis();
        String filePath = "C:/Users/Mukesh Kumar/Desktop/data.txt";
        WordFrequencyRunnableTest runnableTest = new WordFrequencyRunnableTest();
        Map<String, Integer> wordFrequencies = runnableTest.parseLines(filePath);
        runnableTest.printResult(wordFrequencies);
        long elapsedTime = System.currentTimeMillis() - startTime;
        System.out.println("Total execution time in millis: " + elapsedTime);
    }

    public Map<String, Integer> parseLines(String filePath) throws IOException {
        Map<String, Integer> wordFrequencies = new HashMap<>();
        try (BufferedReader bufferedReader = new BufferedReader(new FileReader(filePath))) {
            String eachLine = bufferedReader.readLine();
            while (eachLine != null) {
                List<String> linesForEachThread = new ArrayList<>();
                while (linesForEachThread.size() != 100 && eachLine != null) {
                    linesForEachThread.add(eachLine);
                    eachLine = bufferedReader.readLine();
                }
                WordFrequencyUsingRunnable task = new WordFrequencyUsingRunnable(linesForEachThread, wordFrequencies);
                Thread thread = new Thread(task);
                thread.start();
            }
        }
        return wordFrequencies;
    }

    public void printResult(Map<String, Integer> wordFrequencies) {
        wordFrequencies.forEach((key, value) -> System.out.println(key + " " + value));
    }
}
这是逻辑类:
import java.util.ArrayList;
import java.util.List;
import java.util.Map;

public class WordFrequencyUsingRunnable implements Runnable {

    private final List<String> linesForEachThread;
    private final Map<String, Integer> wordFrequencies;

    public WordFrequencyUsingRunnable(List<String> linesForEachThread, Map<String, Integer> wordFrequencies) {
        this.linesForEachThread = linesForEachThread;
        this.wordFrequencies = wordFrequencies;
    }

    @Override
    public void run() {
        List<String> currentThreadLines = new ArrayList<>(linesForEachThread);
        for (String eachLine : currentThreadLines) {
            String[] eachLineWords = eachLine.toLowerCase().split("([,.\\s]+)");
            synchronized (wordFrequencies) {
                for (String eachWord : eachLineWords) {
                    if (wordFrequencies.containsKey(eachWord)) {
                        wordFrequencies.replace(eachWord, wordFrequencies.get(eachWord) + 1);
                    }
                    wordFrequencies.putIfAbsent(eachWord, 1);
                }
            }
        }
    }
}
我希望得到良好的回应,并提前感谢您的帮助。

最佳答案

在打印结果之前,您应该等待所有线程关闭。

public class WordFrequencyRunnableTest {

    List<Thread> threads = new ArrayList<>();
    public static void main(String[] args) throws IOException {
        ...
        ...
        Map<String, Integer> wordFrequencies = runnableTest.parseLines(filePath);
        for(Thread thread: threads)
        {
           thread.join();
        }
        runnableTest.printResult(wordFrequencies);
        ...
        ...
    }

    public Map<String, Integer> parseLines(String filePath) throws IOException {
        Map<String, Integer> wordFrequencies = new HashMap<>();
        try (BufferedReader bufferedReader = new BufferedReader(new FileReader(filePath))) {
            String eachLine = bufferedReader.readLine();
            while (eachLine != null) {
                List<String> linesForEachThread = new ArrayList<>();
                while (linesForEachThread.size() != 100 && eachLine != null) {
                    linesForEachThread.add(eachLine);
                    eachLine = bufferedReader.readLine();
                }
                WordFrequencyUsingRunnable task = new WordFrequencyUsingRunnable(linesForEachThread, wordFrequencies);
                Thread thread = new Thread(task);
                thread.start();
                threads.add(thread); // Add thread to the list.
            }
        }
        return wordFrequencies;
    }
}
PS - 你可以使用 ConcurrentHashMap<String, AtomicInteger>以避免必须同步对 hashmap 的访问。这样程序会运行得更快。

关于java - 为什么在使用多个线程计算大文件的词频时答案会有所不同?,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/63246273/

相关文章:

java - 如何使用 mscapi.RSAPrivateKey 进行 JWT 签名?

java - 为什么需要以毫秒为单位的超时以及为什么我必须为单独的 block 声明两个具有单独 run () 方法的类

java - 如何使用 Eclipse 在 Java 中的同一监视器上找到所有同步的内容?

c - 高效的C/C++多线程程序可对数据进行分区和处理

ios - 一旦我更改了 Settings Bundle 中的设置,我的应用程序能否同时获得更改?

java - ActionBar 问题 : NPE

java - 使用 Android Studio 3.4 android.support.v7.widget.CardView(修复构建路径、编辑 XML、创建类)

java - Ignite:仅使用核心来传递消息?

python - Python 中的异步函数或线程

java - 多线程数组内容?