java - 扫描数据并将其添加到具有未指定分隔符的数组

标签 java arrays regex string

我有一项作业要求我处理 txt 文件中的以下数据。没有指定的分隔符可以让我更轻松地对数组列表进行排序。我可以使用 Scanner 类读取文本文件并将其排序到数组中,如下所示:

for (int rows; rows < array.length; rows++){
array[rows][0] = fileIn.next();
array[rows][1] = fileIn.next();

等等...但是,名称有点困难,因为它们中有不同数量的空格,并且可能有不同数量的名称。我希望将诸如“Allison, Mrs. Hudson J C (Bessie Waldo Daniels)”之类的全名作为其自己的元素。我不太确定从哪里开始,但我认为一个解决方案是让程序检查“male”||“female”是否存在,以便我们可以开始一个新元素。任何帮助将不胜感激。

1   1   Allen, Miss. Elisabeth Walton   female  29  211.3375
1   1   Allison, Master. Hudson Trevor  male    0.9167  151.5500
1   0   Allison, Miss. Helen Loraine    female  2   151.5500
1   0   Allison, Mr. Hudson Joshua Creighton    male    30  151.5500
1   0   Allison, Mrs. Hudson J C (Bessie Waldo Daniels) female  25  151.5500
1   1   Anderson, Mr. Harry male    48  26.5500
1   1   Andrews, Miss. Kornelia Theodosia   female  63  77.9583
1   0   Andrews, Mr. Thomas Jr  male    39  0.0000
1   1   Appleton, Mrs. Edward Dale (Charlotte Lamson)   female  53  51.4792
1   0   Artagaveytia, Mr. Ramon male    71  49.5042
1   0   Astor, Col. John Jacob  male    47  227.5250
1   1   Astor, Mrs. John Jacob (Madeleine Talmadge Force)   female  18  227.5250
1   1   Aubart, Mme. Leontine Pauline   female  24  69.3000

最佳答案

这非常适合正则表达式 - 请参阅 here获取您的数据示例。

([\d]) +([\d]) +(.+\S) +(female|male) +([\d.]+)  +([\d.]+)
<小时/>

Here Java 中 repl.it 的完整示例

import java.util.regex.Matcher;
import java.util.regex.Pattern;

class Main {
    public static void main( String args[] ){
        String text = 
            "1   1   Allen, Miss. Elisabeth Walton   female  29  211.3375\n"+
            "1   1   Allison, Master. Hudson Trevor  male    0.9167  151.5500\n"+
            "1   0   Allison, Miss. Helen Loraine    female  2   151.5500\n"+
            "1   0   Allison, Mr. Hudson Joshua Creighton    male    30  151.5500\n"+
            "1   0   Allison, Mrs. Hudson J C (Bessie Waldo Daniels) female  25  151.5500\n"+
            "1   1   Anderson, Mr. Harry male    48  26.5500\n"+
            "1   1   Andrews, Miss. Kornelia Theodosia   female  63  77.9583\n"+
            "1   0   Andrews, Mr. Thomas Jr  male    39  0.0000\n"+
            "1   1   Appleton, Mrs. Edward Dale (Charlotte Lamson)   female  53  51.4792\n"+
            "1   0   Artagaveytia, Mr. Ramon male    71  49.5042\n"+
            "1   0   Astor, Col. John Jacob  male    47  227.5250\n"+
            "1   1   Astor, Mrs. John Jacob (Madeleine Talmadge Force)   female  18  227.5250\n"+
            "1   1   Aubart, Mme. Leontine Pauline   female  24  69.3000\n";

        String lines[] = text.split("\\r?\\n");

        String pattern = "([\\d]) +([\\d]) +(.+\\S) +(female|male) +([\\d.]+)  +([\\d.]+)";
        Pattern r = Pattern.compile(pattern);

        for (String l : lines) {
            Matcher m = r.matcher(l);
            if (m.find( )) {
                System.out.println(" ------------------- New Text Line -------------------");
                System.out.println("Group 1: " + m.group(1) );
                System.out.println("Group 2: " + m.group(2) );
                System.out.println("Group 3: " + m.group(3) );
                System.out.println("Group 4: " + m.group(4) );
                System.out.println("Group 5: " + m.group(5) );
                System.out.println("Group 6: " + m.group(6) );
            } else {
                System.out.println("Line did not match");
            }   
        }
    }
}

会产生像这样的输出

 ------------------- New Text Line -------------------
Group 1: 1
Group 2: 1
Group 3: Allen, Miss. Elisabeth Walton
Group 4: female
Group 5: 29
Group 6: 211.3375
 ------------------- New Text Line -------------------
Group 1: 1
Group 2: 1
Group 3: Allison, Master. Hudson Trevor
Group 4: male
Group 5: 0.9167
Group 6: 151.5500
 ------------------- New Text Line -------------------
Group 1: 1
Group 2: 0
Group 3: Allison, Miss. Helen Loraine
Group 4: female
Group 5: 2
Group 6: 151.5500

关于java - 扫描数据并将其添加到具有未指定分隔符的数组,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/38288503/

相关文章:

java - 如何使用 selenium 选择不包含特定类的元素

java - 根据用户输入在PreparedStatement中使用setTime()或setNull()

javascript - 正则表达式 jquery 验证器中字符类的范围乱序

java - 我如何引用在 ANTLR 中多次调用同一规则?

java - 使用 Apache Maths 3.6.1 进行多项式回归

php - 在数组的元素中引用同一数组的另一个元素

Javascript 函数数组删除项目

javascript - 定义数组的长度在 javascript 中返回 0

python - 哪个正则表达式将获得 python 中开始正则表达式和结束正则表达式之间的所有组匹配?

Python 正则表达式 : why doesn't python accept my pattern?