java - 该模式合适的正则表达式是什么?

标签 java regex file

我有一个文本文件,其中包含所有 CS 类(class)的类(class)目录。实际类(class)本身很容易找到,因为它位于行的开头并以类(class)开头。不过,先决条件对我来说有点棘手。我可以找到具有先决条件的行,但先决条件类(class)可以是一个或多个,并用逗号和“和”分隔。有时,先决条件句子后面还有一行包含其他类(class)名称,但不包含先决条件本身。以下是 prereq 文件的示例:

CS 4213. Computing for Bioinformatics. (3-0) 3 Credit Hours.
Prerequisite: CS 1173 or another programming course. Emphasizes computing tasks common in bioinformatics: variables, flow control, input/output, strings, pattern matching, arrays, hash tables, functions, access to databases, and parsing data from queries for common bioinformatics tasks. SQL, XML, and BioPerl. May not be applied to the 24 hours of required electives for computer science majors, but may be included for a computer science minor.
CS 4313. Automata, Computability, and Formal Languages. (3-0) 3 Credit Hours.
Prerequisites: CS 3341 and CS 3343. Discussion of abstract machines (finite state automata, pushdown automata, and Turing machines), formal grammars (regular, context-free, and type 0), and the relationship among them.
CS 4353. Unix and Network Security. (3-0) 3 Credit Hours.
Prerequisite: CS 3433. A technical survey of the fundamentals of computer and information security. Issues include cryptography, authentication, attack techniques at both the OS and network level, defense techniques, intrusion detection, scan techniques and detection, forensics, denial of service techniques and defenses, libpcap, libdnet and libnet programming.
CS 4363. Cryptography. (3-0) 3 Credit Hours.
Prerequisites: CS 3341, CS 3343, and CS 3433. A course in pure and applied cryptography, with emphasis on theory. Topics may include conventional and public-key cryptosystems, signatures, pseudo-random sequences, hash functions, key management, and threshold schemes.
CS 4383. Computer Graphics. (3-0) 3 Credit Hours.
Prerequisites: CS 2121, CS 2123, CS 3341, and CS 3343. An introduction to two- and three-dimensional generative computer graphics. Display devices, data structures, mathematical transformations, and algorithms used in picture generation, manipulation, and display.
CS 4393. User Interfaces. (3-0) 3 Credit Hours.
Prerequisite: CS 3443. Study of advanced user interface issues. User interface design, human factors, usability, GUI programming models, and the psychological aspects of human-computer interaction.
CS 4413. Web Technologies. (3-0) 3 Credit Hours.
Prerequisites: CS 3421 and CS 3423. Fundamentals of Web and component technology: markup languages, layout design, client and server side programming, database and Web integration.
CS 4593. Topics in Computer Science. (3-0) 3 Credit Hours.
Prerequisite: Consent of instructor. Advanced topics in an area of computer science. May be repeated for credit when topics vary.
CS 4633. Simulation. (3-0) 3 Credit Hours.
Prerequisites: CS 3341 and CS 3343. Design, execution, and analysis of simulation models, discrete event simulation techniques, input and output analysis, random numbers, and simulation tools and languages.
CS 4713. Compiler Construction. (3-0) 3 Credit Hours.
Prerequisites: CS 3341, CS 3343, CS 3841, and CS 3843. An introduction to implementation of translators. Topics include formal grammars, scanners, parsing techniques, syntax-directed translation, symbol table management, code generation, and code optimization. (Formerly titled “Compiler Writing.”).

这就是我现在拥有的:

Pattern p = Pattern.compile("^(CS [0-9][0-9][0-9][0-9]).*");
Pattern p2 = Pattern.compile("^Prereq.* ([A-Z]* [0-9][0-9][0-9][0-9]).*");
while ((line = br.readLine()) != null) {
    Matcher m = p.matcher(line);
    if (m.find()) {
        System.out.println(m.group(1));
    }
    Matcher m2 = p2.matcher(line); 
    if (m2.find()) {
        System.out.println("Prereq: "+m2.group(1)+", Occurrences: "+m2.groupCount());
        //System.out.println(line);
    }
}

到目前为止,这获取了所有类(class)和第一个先决条件,或者如果类(class)没有先决条件,则没有。

示例输出:

CS 4213
Prereq: CS 1173, Occurrences: 1
CS 4313
Prereq: CS 3343, Occurrences: 1
CS 4353
Prereq: CS 3433, Occurrences: 1
CS 4363
Prereq: CS 3433, Occurrences: 1
CS 4383
Prereq: CS 3343, Occurrences: 1
CS 4393
Prereq: CS 3443, Occurrences: 1
CS 4413
Prereq: CS 3423, Occurrences: 1
CS 4593
CS 4633
Prereq: CS 3343, Occurrences: 1
CS 4713
Prereq: CS 3843, Occurrences: 1

例如,4313,我想要 CS 3341 和 CS 3343

最佳答案

使用 3 种模式应该会更容易:

    Pattern p = Pattern.compile("^(CS [0-9][0-9][0-9][0-9]).*");
    Pattern p2 = Pattern.compile("^Prereq");
    Pattern p3 = Pattern.compile("[A-Z]+ [0-9]{4}");
    while ((line = br.readLine()) != null) {
        Matcher m = p.matcher(line);
        if (m.find()) {
            System.out.println(m.group(1));
        }
        Matcher m2 = p2.matcher(line);
        if (m2.find()){
            final Matcher m3 = p3.matcher(line);
            while (m3.find()) {
                System.out.println("Prereq: " + m3.group(0));
            }
        }
    }

关于java - 该模式合适的正则表达式是什么?,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/29958343/

相关文章:

Java SQL : Statement. hasResultSet()?

java - LibreOffice Maven 依赖项?

Javascript 正则表达式循环所有匹配项

python - 使用正则表达式 OR 运算符来适应用户输入 "A"或 "An"

html - 如何从浏览器打开本地文件?

ruby - 用ruby写入文件的中间

java - 获取今天起最后 "X"天?

java - 如何在 Java 中使用正则表达式替换所有方括号和逗号?

python - 具有两个正则表达式参数的 fnmatch 函数

c - 如何从未指定的文件大小中读取值并将它们动态存储在 C 中的 vector 中?