我正在尝试阅读一篇技术论文,将所有句子分开,使用过滤器查找句子中的关键术语和短语,然后创建自己的摘要。
到目前为止,我所拥有的是两个 BufferedReaders
读取一个包含段落的文本文件,并且读取我的过滤器。然后将每一行存储到 ArrayList
中并打印到控制台以测试它们是否被正确读取。
我想知道我是否使用 BufferedReader
而不是 Scanner
来以正确的方式解决这个问题。我只是希望能够打印出“.”之后的所有句子。 (刀塔 '!' (感叹号),或“?” (问号)现在,所以我知道文件正在被正确读取。
这是我到目前为止的代码:
import java.io.BufferedReader;
import java.io.DataInputStream;
import java.io.FileInputStream;
import java.io.FileNotFoundException;
import java.io.IOException;
import java.io.InputStreamReader;
import java.util.*;
import java.io.*;
import java.util.Scanner;
public class Filtering {
public static void main(String[] args) throws IOException {
ArrayList<String> lines1 = new ArrayList<String>();
ArrayList<String> lines2 = new ArrayList<String>();
try {
FileInputStream fstream1 = new FileInputStream("paper.txt");
FileInputStream fstream2 = new FileInputStream("filter2.txt");
DataInputStream inStream1 = new DataInputStream (fstream1);
DataInputStream inStream2 = new DataInputStream (fstream2);
BufferedReader br1 = new BufferedReader(
new InputStreamReader(inStream1));
BufferedReader br2 = new BufferedReader(
new InputStreamReader(inStream2));
String strLine1;
String strLine2;
while ((strLine1 = br1.readLine()) != null) {
lines1.add(strLine1);
}
while ((strLine2 = br2.readLine()) != null) {
lines2.add(strLine2);
}
inStream1.close();
inStream2.close();
}
catch (Exception e) {
System.err.println("Error: " + e.getMessage());
}
System.out.println(lines1);
System.out.println(lines2);
}
}
最佳答案
- 使用 BufferedReader 读取任何文件是一个很好的做法,因为它将缓冲文件,而不是逐个访问每个字节
- 不需要 DataInputStream
- 您应该在 InputStreamReader 中指定字符编码
- 您可以将所有字符串累积在 StringBuilder 中,以便在单个引用中包含整个文本
- 您可能想查看BreakIterator将文本分割成句子。看一下 getSentenceInstance()。
import java.io.BufferedReader;
import java.io.File;
import java.io.FileInputStream;
import java.io.IOException;
import java.io.InputStreamReader;
import java.text.BreakIterator;
public class Filtering {
public static void main(String[] args) throws IOException {
File paperFile = new File("paper.txt");
File filterFile = new File("filter2.txt");
// If you want you could roughly initiate the stringbuilders to their
// approximate future size
StringBuilder paper = new StringBuilder();
StringBuilder filter2 = new StringBuilder();
FileInputStream fstream1 = null;
FileInputStream fstream2 = null;
try {
fstream1 = new FileInputStream(paperFile);
fstream2 = new FileInputStream(filterFile);
BufferedReader br1 = new BufferedReader(new InputStreamReader(fstream1, "UTF-8"));
BufferedReader br2 = new BufferedReader(new InputStreamReader(fstream2, "UTF-8"));
String strLine1;
String strLine2;
while ((strLine1 = br1.readLine()) != null) {
paper.append(strLine1).append('\n');
}
while ((strLine2 = br2.readLine()) != null) {
filter2.append(strLine2).append('\n');
}
}
catch (Exception e) {
System.err.println("Error: " + e.getMessage());
} finally {
if (fstream1 != null) {
fstream1.close();
}
if (fstream2 != null) {
fstream2.close();
}
}
String paperString = paper.toString();
String filterString = filter2.toString();
System.out.println(paperString);
System.out.println(filterString);
// To break it into sentences
BreakIterator boundary = BreakIterator.getSentenceInstance();
boundary.setText(paperString);
int start = boundary.first();
for (int end = boundary.next(); end != BreakIterator.DONE; start = end, end = boundary.next()) {
System.out.println(paper.substring(start, end));
}
}
}
关于java - 阅读文本文件后如何打印完整句子?,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/10416041/