java - 统计java中一对单词出现的次数

标签 java tokenize words


快,快棕色,棕色狐狸,狐狸跳 跳过等等...



 import java.util.*;
    import java.util.Map;
    import java.util.HashMap;

    public class Tokenizer

        public static void main(String[] args)
            int index = 0; int tokenCount; int i =0;
            Map<String,Integer> wordCount = new HashMap<String,Integer>();
            Map<Integer,Integer> letterCount = new HashMap<Integer,Integer>();
            String message="The Quick brown fox jumps over the lazy brown dog the quick";

            StringTokenizer string = new StringTokenizer(message);

            tokenCount = string.countTokens();
            System.out.println("Number of tokens = " + tokenCount);
            while (string.hasMoreTokens()) {
                String word = string.nextToken().toLowerCase();
                Integer count = wordCount.get(word);
                Integer lettercount = letterCount.get(word);

                if(count == null) {
                    wordCount.put(word, 1);
                else {
                    wordCount.put(word, count + 1);
            for (String words : wordCount.keySet())
            {System.out.println("Word : " +  words + " has count :" +wordCount.get(words));

            int first ,second;
            first = second = Integer.MIN_VALUE;
            String firstword ="";
            String secondword="";

            for(Map.Entry<String, Integer> entry : wordCount.entrySet())

                int count = entry.getValue();
                String word = entry.getKey();
                    second = first;
                    secondword = firstword;
                    first = count;
                    firstword = word;

                else if(count>second && count ==first){
                    second = count;
                    secondword = word;
            System.out.println(firstword + "" + first);
            System.out.println(secondword + " " + second);

            for(i = 0; i < message.length(); i++){
                char c = message.charAt(i);
                if (c != ' ') {

                    int value = letterCount.getOrDefault((int) c, 0);
                    letterCount.put((int) c, value + 1);

            for(int key : letterCount.keySet()) {
                System.out.println((char) key + ": " + letterCount.get(key));




  1. 以空格为分隔符分割源字符串
  2. 连接相邻的字符串,并用空格分隔
  3. 在源字符串中搜索连接的字符串
  4. 如果没有找到,则添加到Map中,键为单词对,值为1。
  5. 如果找到,则从映射中获取单词对的值并递增并将其设置回来。

    String message = "The Quick brown fox jumps over the lazy brown dog the quick";
    String[] split = message.split(" ");
    Map<String, Integer> map = new HashMap<>();
    int count = 0;
    for (int i = 0; i < split.length - 1; i++) {
        String temp = split[i] + " " + split[i + 1];
        temp = temp.toLowerCase();
        if (message.toLowerCase().contains(temp)) {
            if (map.containsKey(temp))
                map.put(temp, map.get(temp) + 1);
                map.put(temp, 1);

关于java - 统计java中一对单词出现的次数,我们在Stack Overflow上找到一个类似的问题:


java - 如何将编辑文本中的第一个字母替换为自动编号

elasticsearch - Elasticsearch Facet token 化

c++ - 将 CString 转换为 float 数组

Python - 回文函数 : Receiving error "list indices must be integers or slices, not str"

java - 谷歌云构建器 - Java 版本

java - 如何序列化包含不可序列化对象的最终字段

Java - 比较 2 个 int 数组并根据它们的值进行判断

parsing - 正在寻找 "tokenizer"、 "parser"和 "lexers"的明确定义以及它们如何相互关联和使用?

python - 我想在Python 2.7中的长字符串(段落)中提取围绕给定单词的一定数量的单词

python - 完全可解析的词典/词库