我想删除 java 中的停用词。
所以,我从文本文件中读取停用词。
并存储集合
Set<String> stopWords = new LinkedHashSet<String>();
BufferedReader br = new BufferedReader(new FileReader("stopwords.txt"));
String words = null;
while( (words = br.readLine()) != null) {
stopWords.add(words.trim());
}
br.close();
然后,我阅读了另一个文本文件。
所以,我想删除文本文件中的重复字符串。
我该怎么办?
最佳答案
使用 set 作为停用词:
Set<String> stopWords = new LinkedHashSet<String>();
BufferedReader SW= new BufferedReader(new FileReader("StopWord.txt"));
for(String line;(line = SW.readLine()) != null;)
stopWords.add(line.trim());
SW.close();
和 ArrayList 用于输入 txt_file
BufferedReader br = new BufferedReader(new FileReader(txt_file.txt));
//make your arraylist here
// function deletStopWord() for remove all stopword in your "stopword.txt"
public ArrayList<String> deletStopWord(Set stopWords,ArrayList arraylist){
System.out.println(stopWords.contains("?"));
ArrayList<String> NewList = new ArrayList<String>();
int i=3;
while(i < arraylist.size() ){
if(!stopWords.contains(arraylist.get(i))){
NewList.add((String) arraylist.get(i));
}
i++;
}
System.out.println(NewList);
return NewList;
}
arraylist=deletStopWord(stopWords,arraylist);
关于java - 如何删除java中的停用词?,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/12469332/