我目前正在开发一个独立项目,但在将文本文件转换为正确的格式时遇到问题。目前,我的程序读取一个新行——它假设一行=一个句子——但这是有问题的,因为有人可以插入一个标点符号分散在各处的段落。我想做的就是使每个句子成为其单独的行,然后从该文件中读取。我不想空着,所以我尝试了唯一的方法,我让它可以处理短长度的字符串,但是一旦我进入更长的文本文件,我不得不使用 Streams,我遇到了问题:(文件名字太长)
<小时/> 示例: 输入:这是一个虚拟句子。你好,这也是之一。还有这个。输出:
这是一个虚拟句子。
您好,这也是一个。
还有这个。
<小时/> 这是工作public static void main(String args[])
{
String text = "Joanne had one requirement: Her child must be" +
" adopted by college graduates. So the doctor arranged" +
"for the baby to be placed with a lawyer and his wife." +
" Paul and Clara named their new baby Steven Paul Jobs.";
Pattern pattern = Pattern.compile("\\?|\\.|\\!|\\¡|\\¿");
Matcher matcher = pattern.matcher(text);
StringBuilder text_fixed = new StringBuilder();
String withline = "";
int starter = 0;
String overall = "";
String blankspace = " ";
while (matcher.find())
{
int holder = matcher.start();
System.out.println("=========> " + holder);
/***/
withline = text.substring(starter, holder + 1);
withline = withline + "\r\n";
overall = overall + withline;
System.out.println(withline);
starter = holder + 2;
}
System.out.println(overall);
//return overall;
}
<小时/>
这会出现问题:
public static void main(String[] args) throws IOException
{
final String INPUT_FILE = "practice.txt";
InputStream in = new FileInputStream(INPUT_FILE);
String fixread = getStringFromInputStream(in);
String fixedspace = fixme(fixread);
File ins = new File(fixedspace);
BufferedReader reader = new BufferedReader(new FileReader(ins));
Pattern p = Pattern.compile("\n");
String line, sentence;
String[] t;
while ((line = reader.readLine()) != null )
{
t = p.split(line); /**hold curr sentence and remove it from OG txt file since you will reread.*/
sentence = t[0];
indiv_sentences.add(sentence);
}
//putSentencestoTrie(indiv_sentences);
//runAutocompletealt();
}
private static String fixme(String fixread)
{
Pattern pattern = Pattern.compile("\\?|\\.|\\!|\\¡|\\¿");
String actString = fixread.toString();
Matcher matcher = pattern.matcher(actString);
String withline = "";
int starter = 0;
String overall = "";
while (matcher.find())
{
int holder = matcher.start();
withline = actString.substring(starter, holder + 1);
withline = withline + "\r\n";
overall = overall + withline;
starter = holder + 2;
}
return overall;
}
/**this is not my code, this was provided by an outside source, I do not take credit*/
/**http://www.mkyong.com/java/how-to-convert-inputstream-to-string-in-java/*/
private static String getStringFromInputStream(InputStream is) {
BufferedReader br = null;
StringBuilder sb = new StringBuilder();
String line;
try {
br = new BufferedReader(new InputStreamReader(is));
while ((line = br.readLine()) != null) {
sb.append(line);
}
} catch (IOException e) {
e.printStackTrace();
} finally {
if (br != null) {
try {
br.close();
} catch (IOException e) {
e.printStackTrace();
}
}
}
return sb.toString();
}
最佳答案
问题是您正在创建一个名称应该是其内容的文件 - 这对于文件名来说太长了。
String fixedspace = fixme(fixread);
File ins = new File(fixedspace);//this is the issue, you gave the content as its name
尝试给出示例名称并将输出写入文件。下面是一个示例。
String fixedspace = fixme(fixread);
File out= new File("output.txt");
FileWriter fr = new FileWriter(out);
fr.write(fixedspace);
然后阅读并继续。
关于java - 如何修复文本文件中的标点符号?,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/34487274/