java - 如何修复文本文件中的标点符号?

标签 java text stream

我目前正在开发一个独立项目,但在将文本文件转换为正确的格式时遇到问题。目前,我的程序读取一个新行——它假设一行=一个句子——但这是有问题的,因为有人可以插入一个标点符号分散在各处的段落。我想做的就是使每个句子成为其单独的行,然后从该文件中读取。我不想空着,所以我尝试了唯一的方法,我让它可以处理短长度的字符串,但是一旦我进入更长的文本文件,我不得不使用 Streams,我遇到了问题:(文件名字太长)

<小时/> 示例:

输入:这是一个虚拟句子。你好,这也是之一。还有这个。

输出:

这是一个虚拟句子。

您好,这也是一个。

还有这个。

<小时/> 这是工作

public static void main(String args[])
            {
            String text = "Joanne had one requirement: Her child must be" +
                         " adopted by college graduates. So the doctor arranged" +
                            "for the baby to be placed with a lawyer and his wife." + 
                            " Paul and Clara named their new baby Steven Paul Jobs.";    
            Pattern pattern = Pattern.compile("\\?|\\.|\\!|\\¡|\\¿");
            Matcher matcher = pattern.matcher(text);
            StringBuilder text_fixed = new StringBuilder(); 
            String withline = ""; 
            int starter = 0; 
            String overall = "";
            String blankspace = " ";

            while (matcher.find()) 
            {
                int holder = matcher.start(); 
                System.out.println("=========> " + holder);

                /***/

                withline = text.substring(starter, holder + 1); 
                withline = withline + "\r\n";
                overall = overall + withline; 
                System.out.println(withline);
                starter = holder + 2;


            }
                System.out.println(overall);
                //return overall;
            }

<小时/> 这会出现问题:

                public static void main(String[] args) throws IOException
                {
                    final String INPUT_FILE = "practice.txt";
                    InputStream in = new FileInputStream(INPUT_FILE);
                    String fixread = getStringFromInputStream(in);
                   String fixedspace =  fixme(fixread);
                    File ins = new File(fixedspace);
                    BufferedReader reader = new BufferedReader(new FileReader(ins));
                    Pattern p = Pattern.compile("\n");
                    String line, sentence;
                    String[] t;
                    while ((line = reader.readLine()) != null )
                    {
                        t = p.split(line);  /**hold curr sentence and remove it from OG txt file since you will reread.*/
                        sentence = t[0]; 
                        indiv_sentences.add(sentence);   
                    }
                    //putSentencestoTrie(indiv_sentences);
                    //runAutocompletealt();
                }



            private static String fixme(String fixread) 
            {
                Pattern pattern = Pattern.compile("\\?|\\.|\\!|\\¡|\\¿");
                String actString = fixread.toString();
                Matcher matcher = pattern.matcher(actString);
                String withline = ""; 
                int starter = 0; 
                String overall = "";
                while (matcher.find()) 
                {
                    int holder = matcher.start(); 
                    withline = actString.substring(starter, holder + 1); 
                    withline = withline + "\r\n";
                    overall = overall + withline; 
                    starter = holder + 2;
                }

                    return overall;
                }

            /**this is not my code, this was provided by an outside source, I do not take credit*/
            /**http://www.mkyong.com/java/how-to-convert-inputstream-to-string-in-java/*/
            private static String getStringFromInputStream(InputStream is) {

                BufferedReader br = null;
                StringBuilder sb = new StringBuilder();

                String line;
                try {

                    br = new BufferedReader(new InputStreamReader(is));
                    while ((line = br.readLine()) != null) {
                        sb.append(line);
                    }

                } catch (IOException e) {
                    e.printStackTrace();
                } finally {
                    if (br != null) {
                        try {
                            br.close();
                        } catch (IOException e) {
                            e.printStackTrace();
                        }
                    }
                }

                return sb.toString();

            }



https://github.com/ChristianCSE/Phrase-Finder

我很确定这就是我在本节中使用的所有代码,但是如果您需要查看我的其余代码,我提供了指向我的存储库的链接。谢谢! enter image description here

最佳答案

问题是您正在创建一个名称应该是其内容的文件 - 这对于文件名来说太长了。

 String fixedspace =  fixme(fixread);
 File ins = new File(fixedspace);//this is the issue, you gave the content as its name 

尝试给出示例名称并将输出写入文件。下面是一个示例。

String fixedspace =  fixme(fixread);
File out= new File("output.txt");
FileWriter  fr = new FileWriter(out);
fr.write(fixedspace);

然后阅读并继续。

关于java - 如何修复文本文件中的标点符号?,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/34487274/

相关文章:

c++ - 无法将 'std::basic_ostream' 左值绑定(bind)到 'std::basic_ostream<char>&&'

java - Paypal api 调用 adaptiveAccountsService createAccount 获取 “MissingCredentialException”

java.lang.NoSuchMethodError : org. jboss.logging.Logger.getMessageLogger 在 Tomcat (Ubuntu 16.04)

java - 运行jar依赖的java类文件

java - 获取与预期不同的 http 状态代码

jquery - 将 jquery .text() 匹配到

java - 已签名的 Java Applet 写入文本文件

css - 文字不能居中?

android - Kotlin readBytes() 永远不会完成

c# - 将 Stream 转换为 byte[] 数组在 Windows Phone 8 C# 中始终返回 0 长度