java - 计算排序列表中单词的频率

标签 java loops exception frequency

public static void frequencyFinder() throws FileNotFoundException, IOException {
    String foldername = ".../Meta_Oct/separate";
    File folder = new File(foldername);
    File[] listOfFiles = folder.listFiles();


    String line;
    for (int x = 0; x < listOfFiles.length; x++) {
        BufferedReader in = new BufferedReader(new FileReader(listOfFiles[x]));
        String filename = listOfFiles[x].getName();
        String language = filename.split("@")[0];
        String target = filename.split("@")[1];
        String source = filename.split("@")[2];
        int frequency = 0;

        while ((line = in.readLine()) != null) {
            lemma_match = line.split(";")[3];
            frequency = 1;
            while((in.readLine().split(";")[3]).equals(lemma_match)){                 
                frequency++;
                line = in.readLine();                    
            }

            System.out.println(target + ":" + source +":"+lemma_match + ":" + frequency);
            frequency = 0;                
            lemma_match = null;
        }


    }
}

必须计算最后一列中单词的频率。问题是 while 循环会跳过一些行,最终会出现 NullPointerException ,并且在该点之前也不会计算所有频率。我附上了下面的堆栈跟踪以及示例文件。

EN;GOVERNMENT;DISEASE;bristle at 
EN;GOVERNMENT;DISEASE;contract 
EN;GOVERNMENT;DISEASE;detect in 
EN;GOVERNMENT;DISEASE;detect in 
EN;GOVERNMENT;DISEASE;immunize against 
EN;GOVERNMENT;DISEASE;inherit from 
EN;GOVERNMENT;DISEASE;spread 
EN;GOVERNMENT;DISEASE;spread 
EN;GOVERNMENT;DISEASE;spread 
EN;GOVERNMENT;DISEASE;stave off 
EN;GOVERNMENT;DISEASE;stave off 
EN;GOVERNMENT;DISEASE;transmit 
EN;GOVERNMENT;DISEASE;treat 
EN;GOVERNMENT;DISEASE;treat 
EN;GOVERNMENT;DISEASE;treat as 
EN;GOVERNMENT;DISEASE;treat by 
EN;GOVERNMENT;DISEASE;ward off 

堆栈跟踪:

GOVERNMENT:DISEASE:bristle at :1
GOVERNMENT:DISEASE:detect in :2
GOVERNMENT:DISEASE:spread :2
GOVERNMENT:DISEASE:stave off :1
Exception in thread "main" java.lang.NullPointerException
GOVERNMENT:DISEASE:treat :2
    at javaapplication6.FrequencyFinder.frequencyFinder(FrequencyFinder.java:53)
    at javaapplication6.FrequencyFinder.main(FrequencyFinder.java:26)
Java Result: 1

最佳答案

以下代码有问题:

    while ((line = in.readLine()) != null) { // here you read a line
        lemma_match = line.split(";")[3];
        frequency = 1;
        while((in.readLine().split(";")[3]).equals(lemma_match)){ // here you read
                                                                  // another line
            frequency++;
            line = in.readLine(); // here you read another line                   
        }

由于您在此代码中的 3 个位置读取了新行,因此您不会增加所有这些读取的频率。例如,在内循环的每次迭代中,您将读取两行,但仅增加频率一次。即使修复了内部循环,当内部 while 循环结束并且外部 while 循环读取新行时,您仍然会错过一些行。

此外,内部 while 循环会给您带来 NullPointerException,因为您在尝试 之前没有检查 in.readLine() != null拆分它。

现在让我们看看如何用一个循环来做到这一点:

    String lemma_match = "";
    while ((line = in.readLine()) != null) {
        String new_lemma_match = line.split(";")[3];
        if (!lemma_match.equals(new_lemma_match)) { // start count for a new lemma
            if (!lemma_match.equals("")) {
                System.out.println(target + ":" + source +":"+lemma_match + ":" + frequency);
            }
            lemma_match=new_lemma_match;
            frequency = 1; // initialize frequency for new lemma
        } else {
            frequency++; // increase frequency for current lemma
        }
    }

关于java - 计算排序列表中单词的频率,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/26389195/

相关文章:

java - 在 Activity 对象的帮助下访问 RecyclerAdapter 类内的 FragmentManager 时出现 IllegalStateException

java - 无法显示循环中的所有字符

javascript - 如何在 Javascript 中执行 "for ...in ...."循环?

java - Powermockito java.lang.VerifyError

Java Web 应用程序错误,可能在 web.xml 中

java - ConcurrentHashMap 作为具有同步的单例缓存

c++ - 使用 -fno-rtti 在 OS X 上引发和捕获异常的问题

python - 尝试使用 python 中的 Stacks 来阻止错误

java - 如何通过 JOptionPane 修改/添加字符串到 JList

loops - ansible 循环 include_tasks 直到成功