public static void frequencyFinder() throws FileNotFoundException, IOException {
String foldername = ".../Meta_Oct/separate";
File folder = new File(foldername);
File[] listOfFiles = folder.listFiles();
String line;
for (int x = 0; x < listOfFiles.length; x++) {
BufferedReader in = new BufferedReader(new FileReader(listOfFiles[x]));
String filename = listOfFiles[x].getName();
String language = filename.split("@")[0];
String target = filename.split("@")[1];
String source = filename.split("@")[2];
int frequency = 0;
while ((line = in.readLine()) != null) {
lemma_match = line.split(";")[3];
frequency = 1;
while((in.readLine().split(";")[3]).equals(lemma_match)){
frequency++;
line = in.readLine();
}
System.out.println(target + ":" + source +":"+lemma_match + ":" + frequency);
frequency = 0;
lemma_match = null;
}
}
}
必须计算最后一列中单词的频率。问题是 while 循环会跳过一些行,最终会出现 NullPointerException ,并且在该点之前也不会计算所有频率。我附上了下面的堆栈跟踪以及示例文件。
EN;GOVERNMENT;DISEASE;bristle at
EN;GOVERNMENT;DISEASE;contract
EN;GOVERNMENT;DISEASE;detect in
EN;GOVERNMENT;DISEASE;detect in
EN;GOVERNMENT;DISEASE;immunize against
EN;GOVERNMENT;DISEASE;inherit from
EN;GOVERNMENT;DISEASE;spread
EN;GOVERNMENT;DISEASE;spread
EN;GOVERNMENT;DISEASE;spread
EN;GOVERNMENT;DISEASE;stave off
EN;GOVERNMENT;DISEASE;stave off
EN;GOVERNMENT;DISEASE;transmit
EN;GOVERNMENT;DISEASE;treat
EN;GOVERNMENT;DISEASE;treat
EN;GOVERNMENT;DISEASE;treat as
EN;GOVERNMENT;DISEASE;treat by
EN;GOVERNMENT;DISEASE;ward off
堆栈跟踪:
GOVERNMENT:DISEASE:bristle at :1
GOVERNMENT:DISEASE:detect in :2
GOVERNMENT:DISEASE:spread :2
GOVERNMENT:DISEASE:stave off :1
Exception in thread "main" java.lang.NullPointerException
GOVERNMENT:DISEASE:treat :2
at javaapplication6.FrequencyFinder.frequencyFinder(FrequencyFinder.java:53)
at javaapplication6.FrequencyFinder.main(FrequencyFinder.java:26)
Java Result: 1
最佳答案
以下代码有问题:
while ((line = in.readLine()) != null) { // here you read a line
lemma_match = line.split(";")[3];
frequency = 1;
while((in.readLine().split(";")[3]).equals(lemma_match)){ // here you read
// another line
frequency++;
line = in.readLine(); // here you read another line
}
由于您在此代码中的 3 个位置读取了新行,因此您不会增加所有这些读取的频率。例如,在内循环的每次迭代中,您将读取两行,但仅增加频率
一次。即使修复了内部循环,当内部 while 循环结束并且外部 while 循环读取新行时,您仍然会错过一些行。
此外,内部 while 循环会给您带来 NullPointerException
,因为您在尝试 之前没有检查
它。in.readLine() != null
拆分
现在让我们看看如何用一个循环来做到这一点:
String lemma_match = "";
while ((line = in.readLine()) != null) {
String new_lemma_match = line.split(";")[3];
if (!lemma_match.equals(new_lemma_match)) { // start count for a new lemma
if (!lemma_match.equals("")) {
System.out.println(target + ":" + source +":"+lemma_match + ":" + frequency);
}
lemma_match=new_lemma_match;
frequency = 1; // initialize frequency for new lemma
} else {
frequency++; // increase frequency for current lemma
}
}
关于java - 计算排序列表中单词的频率,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/26389195/