java - 您如何评价以下 java 解决方案或者您将如何解决它?

标签 java software-quality

您如何评价以下任务的解决方案的结构、正确性、简单性、可测试性(任务时间约 1 小时):

Create a command-line Java program that counts unique words from a text file and lists the top 10 occurrences.

English locale and treating hyphen and apostrophe as part of a word, output should look like the following:

and (514)

the (513)

i (446)

to (324)

a (310)

of (295)

my (288)

you (211)

that (188)

this (185)

解决方案:

WordCalculator.java(主类)

public class WordCalculator {

    /**
     * Counts unique words from a text file and lists the top 10 occurrences.
     *
     * @param args the command line arguments. First argument is the file path.
     * If omitted, user will be prompted to specify path.
     *
     * @throws java.io.FileNotFoundException if the file for some other reason
     * cannot be opened for reading.
     *
     * @throws java.io.IOException If an I/O error occurs
     */
    public static void main(String[] args) throws FileNotFoundException, IOException {

        File file;
        List<String> listOfWords = new ArrayList<>();

        // If a command argument is specified, use it as the file path.
        // Otherwise prompt user for the path.
        if (args.length > 0) {

            file = new File(args[0]);

        } else {

            Scanner scanner = new Scanner(System.in);
            System.out.print("Enter path to file: ");
            file = new File(scanner.nextLine());

        }

        // Reads the file and splits the input into a list of words
        try (BufferedReader br = new BufferedReader(new FileReader(file))) {

            String line;
            while ((line = br.readLine()) != null) {
            listOfWords.addAll(WordUtil.getWordsFromString(line));
            }

        } catch (FileNotFoundException ex) {

            Logger.getLogger(WordCalculator.class.getName()).log(Level.SEVERE,
                String.format("Access denied reading from file '%s'.", file.getAbsolutePath()), ex);
            throw ex;

        } catch (IOException ex) {

            Logger.getLogger(WordCalculator.class.getName()).log(Level.SEVERE,
                "I/O error while reading input file.", ex);
            throw ex;

        }

        // Retrieves the top ten frequent words and their frequencies.
        Map<Object, Long> freqMap = FrequencyUtil.getItemFrequencies(listOfWords);
        List<Map.Entry<?, Long>> topTenWords = FrequencyUtil.limitFrequency(freqMap, 10);

        // Prints the top ten words and their frequencies.
        topTenWords.forEach((word) -> {
        System.out.printf("%s (%d)\r\n", word.getKey(), word.getValue());
        });
    }
}

FrequencyUtil.java

public class FrequencyUtil {

    /**
     * Transforms a list into a map with elements and their frequencies.
     *
     * @param list, the list to parse
     * @return the item-frequency map.
     */
    public static Map<Object, Long> getItemFrequencies(List<?> list) {

        return list.stream()
                .collect(Collectors.groupingBy(obj -> obj,Collectors.counting()));

    }

    /**
     * Sorts a frequency map in descending order and limits the list.
     *
     * @param objFreq the map elements and their frequencies.
     * @param limit the limit of the returning list
     * @return a list with the top frequent words
     */
    public static List<Map.Entry<?, Long>> limitFrequency(Map<?, Long> objFreq, int limit) {

        return objFreq.entrySet().stream()
            .sorted(Map.Entry.comparingByValue(Comparator.reverseOrder()))
            .limit(limit)
            .collect(Collectors.toList());

    }

}

WordUtil.java

public class WordUtil {

    public static final Pattern ENGLISH_WORD_PATTERN = Pattern.compile("[A-Za-z'\\-]+");

    /**
     *
     * @param s the string to parse into a list of words. Words not matching the
     * english pattern(a-z A-z ' -) will be omitted.
     *
     * @return a list of the words
     *
     */
    public static List<String> getWordsFromString(String s) {

        ArrayList<String> list = new ArrayList<>();
        Matcher matcher = ENGLISH_WORD_PATTERN.matcher(s);

        while (matcher.find()) {

            list.add(matcher.group().toLowerCase());

        }

        return list;

    }

}

最佳答案

您的解决方案是正确的,但如果您正在寻找功能较少的编程解决方案和更多的 OOP。您应该避免将 Utils 类与静态方法一起使用。您可以使用 WordCalculator 添加实例方法和属性作为计数单词的映射。此外,正则表达式模式对性能操作来说很重,并且您正在执行循环(以功能方式)将此分割的单词添加到 map 中。其他选项是逐字节读取文件,当您发现非字母字符(文本文件很简单就足以检查空格)时,将单词从 StringBuilder 转储到映射中,并向计数器添加 1。这样,如果文件是一个巨大的单行文本,您还可以避免可能出现的问题。

更新 1 - 添加读取单词示例:

private void readWords(File file) {

    try (BufferedReader bufferedReader = new BufferedReader(new FileReader(file))) {
        StringBuilder build = new StringBuilder();

        int value;
        while ((value = bufferedReader.read()) != -1) {
            if(Character.isLetterOrDigit(value)){
                build.append((char)Character.toLowerCase(value));
            } else {
                if(build.length()>0) {
                    addtoWordMap(build.toString());
                    build = new StringBuilder();
                }
            }
        }
        if(build.length()>0) {
            addtoWordMap(build.toString());
        }

    } catch(FileNotFoundException e) {
        //todo manage exception
        e.printStackTrace();
    } catch (IOException e) {
        //todo manage exception
        e.printStackTrace();
    }
}

关于java - 您如何评价以下 java 解决方案或者您将如何解决它?,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/45695374/

相关文章:

java - 泰坦警告 : Query requires iterating over all vertices [(name <> null)]

java - ObjectMapper 将日期更改为字符串

math - AMN 和数学逻辑符号

java - Java 中类的作用

java - 正则表达式如何分隔多项式的项

git - 持续集成工作流思想

unit-testing - 为什么代码质量讨论会引起强烈反响?

java - Checkstyle 和 PMD 仅作为建议

php - 测试构造函数太多了吗?

java - 无法通过 servlet 填充数据库