Java 不会因双逗号而崩溃 "malformed line"

标签 java regex numberformatexception

我手头的任务是能够根据地址中的文本文件处理行,并将它们分类到各自的类别中,“东”、“西”、“百老汇”、“大道”和“坏 ID” 。下面的代码可以 100% 正确地执行此操作,直到遇到包含双逗号的格式错误的行,而我可以用单逗号替换所有双逗号,但这并不能完全解决问题,因为该行将被视为“格式错误”并且然后应该添加到 badId 的类别中,但它会导致 NumberFormatException 完全错误和下面的代码。我想知道是否有可能以一种不会导致此异常的方式忽略双逗号,但仍然能够解析文件的其余部分,并按预期将此行添加到 badId 的数组中。

读取文本文件

123-ABC-4567, 15 W. 15th St., 50.1
456-BGT-9876,22 Broadway,24
QAZ-456-QWER, 100 East 20th Street,50
Q2Z-457-QWER, 200 East 20th Street, 49
678-FGH-9845 ,,45 5th Ave, 12.2,
678-FGH-9846 ,45 5th Ave, 12.2

123-ABC-9999, 46 Foo Bar, 220.0
347-poy-3465, 101 B'way,24

错误

java.lang.NumberFormatException: For input string: "45 5th Ave"
    at sun.misc.FloatingDecimal.readJavaFormatString(Unknown Source)
    at sun.misc.FloatingDecimal.parseFloat(Unknown Source)
    at java.lang.Float.parseFloat(Unknown Source)
    at java.lang.Float.valueOf(Unknown Source)
    at csi311.HelloCsi311.readFile(HelloCsi311.java:99)
    at csi311.HelloCsi311.run(HelloCsi311.java:28)
    at csi311.HelloCsi311.main(HelloCsi311.java:240)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at sun.reflect.NativeMethodAccessorImpl.invoke(Unknown Source)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(Unknown Source)
    at java.lang.reflect.Method.invoke(Unknown Source)
    at edu.rice.cs.drjava.model.compiler.JavacCompiler.runCommand(JavacCompiler.java:267)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at sun.reflect.NativeMethodAccessorImpl.invoke(Unknown Source)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(Unknown Source)
    at java.lang.reflect.Method.invoke(Unknown Source)
    at edu.rice.cs.dynamicjava.symbol.JavaClass$JavaMethod.evaluate(JavaClass.java:362)
    at edu.rice.cs.dynamicjava.interpreter.ExpressionEvaluator.handleMethodCall(ExpressionEvaluator.java:92)
    at edu.rice.cs.dynamicjava.interpreter.ExpressionEvaluator.visit(ExpressionEvaluator.java:84)
    at koala.dynamicjava.tree.StaticMethodCall.acceptVisitor(StaticMethodCall.java:121)
    at edu.rice.cs.dynamicjava.interpreter.ExpressionEvaluator.value(ExpressionEvaluator.java:38)
    at edu.rice.cs.dynamicjava.interpreter.ExpressionEvaluator.value(ExpressionEvaluator.java:37)
    at edu.rice.cs.dynamicjava.interpreter.StatementEvaluator.visit(StatementEvaluator.java:106)
    at edu.rice.cs.dynamicjava.interpreter.StatementEvaluator.visit(StatementEvaluator.java:29)
    at koala.dynamicjava.tree.ExpressionStatement.acceptVisitor(ExpressionStatement.java:101)
    at edu.rice.cs.dynamicjava.interpreter.StatementEvaluator.evaluateSequence(StatementEvaluator.java:66)
    at edu.rice.cs.dynamicjava.interpreter.Interpreter.evaluate(Interpreter.java:77)
    at edu.rice.cs.dynamicjava.interpreter.Interpreter.interpret(Interpreter.java:47)
    at edu.rice.cs.drjava.model.repl.newjvm.InterpreterJVM.interpret(InterpreterJVM.java:249)
    at edu.rice.cs.drjava.model.repl.newjvm.InterpreterJVM.interpret(InterpreterJVM.java:222)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at sun.reflect.NativeMethodAccessorImpl.invoke(Unknown Source)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(Unknown Source)
    at java.lang.reflect.Method.invoke(Unknown Source)
    at sun.rmi.server.UnicastServerRef.dispatch(Unknown Source)
    at sun.rmi.transport.Transport$1.run(Unknown Source)
    at sun.rmi.transport.Transport$1.run(Unknown Source)
    at java.security.AccessController.doPrivileged(Native Method)
    at sun.rmi.transport.Transport.serviceCall(Unknown Source)
    at sun.rmi.transport.tcp.TCPTransport.handleMessages(Unknown Source)
    at sun.rmi.transport.tcp.TCPTransport$ConnectionHandler.run0(Unknown Source)
    at sun.rmi.transport.tcp.TCPTransport$ConnectionHandler.lambda$run$0(Unknown Source)
    at java.security.AccessController.doPrivileged(Native Method)
    at sun.rmi.transport.tcp.TCPTransport$ConnectionHandler.run(Unknown Source)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)
    at java.lang.Thread.run(Unknown Source)

代码

import java.io.BufferedReader;
import java.io.FileReader;
import java.util.regex.Pattern;
import java.util.regex.Matcher;
import java.util.ArrayList;
/**
 * Hello world example.  Shows passing in command line arguments, in this case a filename. 
 * If the filename is given, read in the file and echo it to stdout.
 */
public class HelloCsi311 {

    /**
     * Class construtor.
     */
    public HelloCsi311() {
    }


    /**
     * @param filename the name of a file to read in 
     * @throws Exception on anything bad happening 
     */
    public void run(String filename) throws Exception {
     if (filename != null) {
      readFile(filename); 
     }
    }


    /**
     * @param filename the name of a file to read in 
     * @throws Exception on anything bad happening 
     */
    private void readFile(String filename) throws Exception {
     System.out.println("Processing file: " + filename); 
     // Open the file and connect it to a buffered reader.
     BufferedReader br = new BufferedReader(new FileReader(filename));  
     ArrayList<String> foundaddr = new ArrayList<String>();
     ArrayList<String> broadway = new ArrayList<String>();
     ArrayList<String> ave = new ArrayList<String>();
     ArrayList<String> east = new ArrayList<String>();
     ArrayList<String> west = new ArrayList<String>();
     ArrayList<String> overlb = new ArrayList<String>();
     ArrayList<String> badId = new ArrayList<String>();
     String line = null;  
     String pattern = "^\\d\\d\\d-[A-Za-z][A-Za-z][A-Za-z]-\\d\\d\\d\\d";
     String west1 = "^\\d{1,4}\\s\\b(West|west)\\b\\s\\d{1,3}\\w+\\s\\b(St|st)\\B";
     String west2 = "^\\d{1,4}\\s\\b(W|w)\\.\\s\\d{1,3}\\w+\\s\\b(St|st)\\.";
     String west3 = "^\\d{1,4}\\s\\b(W|w)\\s\\d{1,3}\\w+\\s\\b(St|st)";
     String east1 = "^\\d{1,4}\\s\\b(East|east)\\b\\s\\d{1,3}\\w+\\s\\b(St|st)\\B";
     String east2 = "^\\d{1,4}\\s\\b(E|e)\\.\\s\\d{1,3}\\w+\\s\\b(St|st)";
     String east3 = "^\\d{1,4}\\s\\b(E|e)\\s\\d{1,3}\\w+\\s\\b(St|st)";
     String broad1 = "^\\d{1,4}\\s\\b(B|b)\\B(Way|way)";
     String broad2 = "^\\d{1,4}\\s\\b(B|b)\\b(.|')(Way|way)";
     String broad3 = "^\\d{1,4}\\s\\b(Broadway|broadway)";
     String avenue1 = "^\\d{1,4}\\s\\w+\\s\\b(Ave|ave)";
     String avenue2 = "^\\d{1,4}\\s\\w+\\s\\b(Ave.|ave.)";
     String avenue3 = "^\\d{1,4}\\s\\w+\\s\\b(Avenue|avenue)";
     Pattern r = Pattern.compile(pattern);
     Pattern z = Pattern.compile(east1);
     Pattern zz = Pattern.compile(east2);
     Pattern zzz = Pattern.compile(east3);
     Pattern we = Pattern.compile(west1);
     Pattern wee = Pattern.compile(west2);
     Pattern weee = Pattern.compile(west3);
     Pattern broadc = Pattern.compile(broad1);
     Pattern broadcc = Pattern.compile(broad2);
     Pattern broadccc = Pattern.compile(broad3);
     Pattern avec = Pattern.compile(avenue1);
     Pattern avecc = Pattern.compile(avenue2);
     Pattern aveccc = Pattern.compile(avenue3);
     // Get lines from the file one at a time until there are no more.
     while ((line = br.readLine()) != null) {
       if(line.trim().isEmpty()) {
         continue;
       }
       String sample = line.replaceAll("\\s+,", ",").replaceAll(",+\\s",",");
       String[] result = sample.split(",");
       String pkgId = result[0].trim().toUpperCase();
       String pkgAddr = result[1].trim();
             // System.out.println(sample);
       //System.out.println(pkgId);
       //System.out.println(pkgAddr);
         Matcher easts = z.matcher(pkgAddr);
         Matcher eastss = zz.matcher(pkgAddr);
         Matcher eastsss = zzz.matcher(pkgAddr);
         Matcher wests = we.matcher(pkgAddr);
         Matcher westss = wee.matcher(pkgAddr);
         Matcher westsss = weee.matcher(pkgAddr);
         Matcher broadways = broadc.matcher(pkgAddr);
         Matcher broadwayss = broadcc.matcher(pkgAddr);
         Matcher broadwaysss = broadccc.matcher(pkgAddr);
         Matcher avenues = avec.matcher(pkgAddr);
         Matcher avenuess = avecc.matcher(pkgAddr);
         Matcher avenuesss = aveccc.matcher(pkgAddr);
         Float f = Float.valueOf(result[2]);
       for(String str : result){
         //System.out.println(str);
         // Trying to match for different types
         Matcher m = r.matcher(str);
         // REMEMBER TO ADD BROADWAYS AND AVENUES HERE TOO AND FIX SO IT DOESNT HAVE ALL THE IDS
         if (!pkgId.matches(pattern) || !pkgAddr.matches(west1) && !pkgAddr.matches(west2) && !pkgAddr.matches(west3)
            && !pkgAddr.matches(east1) && !pkgAddr.matches(east2) && !pkgAddr.matches(east3) && !pkgAddr.matches(broad1) 
            && !pkgAddr.matches(broad2) && !pkgAddr.matches(broad3) && !pkgAddr.matches(avenue1) && !pkgAddr.matches(avenue2)
            && !pkgAddr.matches(avenue3)) {
           if(!badId.contains(pkgId)){
             badId.add(pkgId);
           }
           //System.out.println(pkgId);
         } 
           if(f < 50){
           //System.out.println(str);
           if(m.find()) {
             //System.out.println(str);
             //System.out.println(pkgAddr); 
             if(avenues.find() || avenuess.find() || avenuesss.find()){
               if(!ave.contains(pkgAddr)){
                 ave.add(pkgAddr);
               }
             }

             if(broadways.find() || broadwayss.find() || broadwaysss.find()){
               if(!broadway.contains(pkgAddr)){
                 broadway.add(pkgAddr);
               }
             }

         if(easts.find() || eastss.find() || eastsss.find()){
           if(!east.contains(pkgAddr)){
             east.add(pkgAddr);
           }
                       }

         if(wests.find() || westss.find() || westsss.find()){
           if(!west.contains(pkgAddr)){
             west.add(pkgAddr);
           }
                       }
           }        
           //System.out.println(str);
         }
          if(f > 50){
            if(avenues.find() || avenuesss.find() || avenuesss.find()){
              if(!ave.contains(pkgAddr)){
                ave.add(pkgAddr);
                if(!overlb.contains(pkgId)){
                  overlb.add(pkgId);
                }
              }
            }

            if(broadways.find() || broadwayss.find() || broadwaysss.find()){
              if(!broadway.contains(pkgAddr)){
                broadway.add(pkgAddr);
                if(!overlb.contains(pkgId)){
                  overlb.add(pkgId);
                }
              }
            }            
          // System.out.println(str);
                    if(easts.find() || eastss.find() || eastsss.find()){
           if(!east.contains(pkgAddr)){
             east.add(pkgAddr);
                      if(!overlb.contains(pkgId)){
           //System.out.println(pkgId);
           overlb.add(pkgId);
           }
           }
                       }
         if(wests.find() || westss.find() || westsss.find()){
           if(!west.contains(pkgAddr)){
             west.add(pkgAddr);
                      if(!overlb.contains(pkgId)){
           //System.out.println(pkgId);
           overlb.add(pkgId);
           }
           }
                       }
         //System.out.println(str);


         }
       }

     }

     if(west != null) {
      // System.out.println(east);
       System.out.println("West: " + west.size());
     }

     if(east != null){
      // System.out.println(west);
       System.out.println("East: " + east.size());
     } 

     if(ave != null){
       //System.out.println(ave);
       System.out.println("Ave: " + ave.size());
     }

     if(broadway != null){
       //System.out.println(broadway);
       System.out.println("Bway: " + broadway.size());
     }

      if(overlb != null){
       // System.out.println(overlb);
        System.out.println(">50lbs: " + overlb.size());
      }

      if(badId != null){
        System.out.println("Ids?: " + badId);

      }


     // Close the buffer and the underlying file.
     br.close();
    }



    /**
     * @param args filename
     */
    public static void main(String[] args) {
     // Make an instance of the class.
     HelloCsi311 theApp = new HelloCsi311();
     String filename = null; 
     // If a command line argument was given, use it as the filename.
     if (args.length > 0) {
      filename = args[0]; 
     }
     try { 
      // Run the run(), passing in the filename, null if not specified.
      theApp.run(filename);
     }
     catch (Exception e) {
      // If anything bad happens, report it.
      System.out.println("Something bad happened!");
      e.printStackTrace();
     }

    }
}

预期输出

Processing file: test.in
West:   1
East:   0
Ave:    1
Bway:   2
>50lbs: 1
Ids?:   [QAZ-456-QWER, Q2Z-457-QWER, 678-FGH-9845, 123-ABC-9999]

最佳答案

这里的问题在于:

Float f = Float.valueOf(result[2]);

这里您尝试将第二个索引的值转换为 Float .

在前四行数据中,转换没有问题,因为转换的值为 50.1,24,50

但是,由于“双逗号”实际上会被解析为空字符串,因此现在的转换改为 45 5th Ave ,然后会抛出 NumberFormatException .

在有关过滤掉数组中空值的注释中查询后添加了以下部分:

您可以使用以下代码过滤掉数组中的空值(Java 8 及更高版本):

String[] filteredResult = Arrays.stream(result).filter(o -> !o.isEmpty()).toArray(String[]::new);

话虽这么说..此解决方案专门针对您在这种情况下面临的问题,并且可能不是一个好的解决方案。

实际的解决方案是在开始解析数据之前对其进行实际清理。

关于Java 不会因双逗号而崩溃 "malformed line",我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/54509627/

相关文章:

Java Scanner hasNext() 字符串正则表达式验证

java - 图像文件无法在 2D Java 游戏中加载

python正则表达式直到第一组数字

javascript - 是否可以使用正则表达式 chop 字符串的开头?

java - 在计算器中按取消按钮 'C' 后出现错误(Eclipse IDE 和 swings)

apache-spark - Spark 上下文 : Error initializing SparkContext while Running Spark Job via google DataProc

Java 线程,join() 花费太长时间?

Java servlet 不写入响应字节

python - 用正则表达式隔离字母后的第一个数字

将 1000000000000001 转换为基数 5 时出现 Java NumberFormatException