java - 将字符串分组到数组中

我有这些字符串；

wordsExpanded="test |  is |  [(thirty four) {<number_type_0 words>}( 3  4 ) {<number_type_0 digits>}] |  test |  [(three) {<number_type_1 words>}( 3 ) {<number_type_1 digits>}] |  [(one) {<number_type_2 words>}( 1 ) {<number_type_2 digits>}]"

interpretation="{<number_type_2 digits> <number_type_1 digits> <number_type_0 words>}"

我需要的输出是这样的字符串；

finalOutput="test |  is | thirty four | test | 3 | 1 "

基本上，解释字符串包含确定已使用哪个组所需的信息。对于第一个，我们使用了，因此正确的字符串是“(34)”而不是“( 3 4 )” 第二个是“( 3 )”，然后是“( 1 )”

这是到目前为止我的代码；

package com.test.prova;

import java.util.ArrayList;
import java.util.List;
import java.util.regex.Matcher;
import java.util.regex.Pattern;

public class Prova {

    public static void main(String[] args) {
        String nlInterpretation="{<number_type_2 digits> <number_type_1 digits> <number_type_0 words>}";
        String inputText="this is 34 test 3 1";
        String grammar="test is [(thirty four) {<number_type_0 words>}( 3  4 ) {<number_type_0 digits>}] test [(three) {<number_type_1 words>}( 3 ) {<number_type_1 digits>}] [(one) {<number_type_2 words>}( 1 ) {<number_type_2 digits>}]";

        List<String> matchList = new ArrayList<String>();
        Pattern regex = Pattern.compile("[^\\s\"'\\[]+|\\[([^\\]]*)\\]|'([^']*)'");
        Matcher regexMatcher = regex.matcher(grammar);
        while (regexMatcher.find()) {
            if (regexMatcher.group(1) != null) {
                matchList.add(regexMatcher.group(1));
            } else if (regexMatcher.group(2) != null) {
                matchList.add(regexMatcher.group(2));
            } else {
                matchList.add(regexMatcher.group());
            }
        } 

        String[] xx = matchList.toArray(new String[0]);
        String[] yy = inputText.split(" ");

        matchList = new ArrayList<String>();
        regex = Pattern.compile("[^<]+|<([^>]*)>");
        regexMatcher = regex.matcher(nlInterpretation);
        while (regexMatcher.find()) {
            if (regexMatcher.group(1) != null) {
                matchList.add(regexMatcher.group(1));
            }
        } 
        String[] zz = matchList.toArray(new String[0]);
        System.out.println(String.join(" | ",zz));

        for (int i=0; i<xx.length; i++) {
            if (xx[i].contains("number_type_")) {
                matchList = new ArrayList<String>();
                regex = Pattern.compile("[^\\(]+|<([^\\)]*)>.*[^<]+|<([^>]*)>");
                regexMatcher = regex.matcher(xx[i]);
                while (regexMatcher.find()) {
                    if (regexMatcher.group(1) != null) {
                        matchList.add(regexMatcher.group(1));
                    } else if (regexMatcher.group(2) != null) {
                        matchList.add(regexMatcher.group(2));
                    } else {
                        matchList.add(regexMatcher.group());
                    }
                } 
                System.out.println(String.join(" | ",matchList.toArray(new String[0])));
            }
            System.out.printf("%02d\t%s\t->%s\n", i, yy[i], xx[i]);
        }
    }
}

生成的输出如下；

number_type_2 digits | number_type_1 digits | number_type_0 words
00  this    ->test
01  is  ->is
thirty four) {<number_type_0 words>} |  3  4 ) {<number_type_0 digits>}
02  34  ->(thirty four) {<number_type_0 words>}( 3  4 ) {<number_type_0 digits>}
03  test    ->test
three) {<number_type_1 words>} |  3 ) {<number_type_1 digits>}
04  3   ->(three) {<number_type_1 words>}( 3 ) {<number_type_1 digits>}
one) {<number_type_2 words>} |  1 ) {<number_type_2 digits>}
05  1   ->(one) {<number_type_2 words>}( 1 ) {<number_type_2 digits>}

我想要的更像是这样；

number_type_2 digits | number_type_1 digits | number_type_0 words
00  this    ->test
01  is      ->is
02  34      ->thirty four
03  test    ->test
04  3       ->3
05  1       ->1

最佳答案

我正在编写一个基于以下假设的解决方案:字符串 interpretation 的格式保持不变，即 {<number_type_2 digits> <number_type_1 digits> <number_type_0 words>}它不会改变。

我将描述 Java 7 和 Java 8 方法。我非常清楚地表明，我的算法在指数时间内运行，并且这是一种简单直接的方法。我无法在短时间内更快地想到任何事情。

让我们开始浏览代码:

Java-7 风格

/*
     * STEP 1: Create a method that accepts wordsExpanded and
     * interpretation Strings
     */
    public static void parseString(String wordsExpanded, String interoperation) {
        /*
         * STEP 2: Remove leading and tailing curly braces form
         * interoperation String
         */
        interoperation= interoperation.replaceAll("\\{", "");
        interoperation = interoperation.replaceAll("\\}", "");

        /*
         * STEP 3: Split your interoperation String at '>'
         * because we need individual interoperations  like
         * "<number_type_2 words" to compare. 
         */
        String[] allInterpretations = interoperation.split(">");

        /*
         * STEP 4: Split your wordsExpanded String at '|'
         * to get each word.
         */
        String[] allWordsExpanded = wordsExpanded.split("\\|");

        /*
         * STEP 5: Create a resultant StringBuilder
         */
        StringBuilder resultBuilder = new StringBuilder();

        /*
         * STEP 6: Iterate over each words form wordsExpanded
         * after splitting.
         */
        for(String eachWordExpanded : allWordsExpanded){
            /*
             * STEP 7: Remove leading and tailing spaces
             */
            eachWordExpanded = eachWordExpanded.trim();
            /*
             * STEP 8: Remove leading and tailing curly braces
             */
            eachWordExpanded = eachWordExpanded.replaceAll("\\{", "");
            eachWordExpanded = eachWordExpanded.replaceAll("\\}", "");

            /*
             * STEP 9: Now, iterate over each interoperation.
             */
            for(String eachInteroperation : allInterpretations){
                /*
                 * STEP 10: Remove the leading and tailing spaces
                 * from each interoperations.
                 */
                eachInteroperation = eachInteroperation.trim();

                /*
                 * STEP 11: Now append '>' to end of each interoperation
                 * because we'd split each of them at '>' previously.
                 */
                eachInteroperation = eachInteroperation + ">";

                /*
                 * STEP 12: Check if each eordExpanded contains any of the
                 * interoperation. 
                 */
                if(eachWordExpanded.contains(eachInteroperation)){

                    /*
                     * STEP 13: If each interoperation contains
                     * 'word', goto STEP 14.
                     * ELSE goto STEP 18.
                     */
                    if(eachInteroperation.contains("words")){
                        /*
                         * STEP 14: Remove that interoperation from the
                         * each wordExpanded String.
                         * 
                         * Ex: if the interoperation is <number_type_2 words>
                         * and it is found in the wordExpanded, remove it.
                         */
                        eachWordExpanded = eachWordExpanded.replaceAll(eachInteroperation, "");
                        /*
                         * STEP 15: Now change the interoperation to digits.
                         * Ex: IF the interoperation is <number_type_2 words>,
                         * change that to <number_type_2 digits> and also remove them.
                         */
                        eachInteroperation = eachInteroperation.replaceAll("words", "digits");
                        eachWordExpanded = eachWordExpanded.replaceAll(eachInteroperation, "");

                        /*
                         * STEP 16: Remove leading and tailing square braces
                         */
                        eachWordExpanded = eachWordExpanded.replaceAll("\\[", "");
                        eachWordExpanded = eachWordExpanded.replaceAll("\\]", "");

                        /*
                         * STEP 17: Remove any numbers in the form ( 3 ),
                         * since we are dealing with words.
                         */
                        eachWordExpanded = eachWordExpanded.replaceAll("[(0-9)+]", "");
                        eachWordExpanded = eachWordExpanded.replaceAll("(\\s)+", " ");
                    }else{
                        /*
                         * STEP 18: Remove the interoperation just like STEP 14.
                         */
                        eachWordExpanded = eachWordExpanded.replaceAll(eachInteroperation, "");
                        /*
                         * STEP 19: Now, change interoperations to words just like STEP 15,
                         * since we are dealing with digits here and then, remove it from the
                         * each wordExpanded String.
                         */
                        eachInteroperation = eachInteroperation.replaceAll("digits", "words");
                        eachWordExpanded = eachWordExpanded.replaceAll(eachInteroperation, "");

                        /*
                         * STEP 20: Remove the leading and tailing square braces.
                         */
                        eachWordExpanded = eachWordExpanded.replaceAll("\\[", "");
                        eachWordExpanded = eachWordExpanded.replaceAll("\\]", "");
                        /*
                         * STEP 21: Remove the words in the form '(thirty four)'
                         */
                        eachWordExpanded = eachWordExpanded.replaceAll("[(A-Za-z)+]", "");
                        eachWordExpanded = eachWordExpanded.replaceAll("\\s", "");
                    }
                }else{
                    continue;
                }
            }
            /*
             * STEP 22: Build your result object
             */
            resultBuilder.append(eachWordExpanded + "|");
        }
        /*
         * FINAL RESULT
         */
        System.out.println(resultBuilder.toString());
}

等效的Java-8样式如下:

public static void parseString(String wordsExpanded, String interoperation) {
        interoperation= interoperation.replaceAll("\\{", "");
        interoperation = interoperation.replaceAll("\\}", "");

        String[] allInterpretations = interoperation.split(">");

        StringJoiner joiner = new StringJoiner("");
        Set<String> allInterOperations = Arrays.asList(interoperation.split(">"))
            .stream()
            .map(eachInterOperation -> {
            eachInterOperation = eachInterOperation.trim();
            eachInterOperation = eachInterOperation + ">";
            return eachInterOperation;
        }).collect(Collectors.toSet());

        String result = Arrays.asList(wordsExpanded.split("\\|"))
        .stream()
        .map(eachWordExpanded -> {
        eachWordExpanded = eachWordExpanded.trim();
        eachWordExpanded = eachWordExpanded.replaceAll("\\{", "");
        eachWordExpanded = eachWordExpanded.replaceAll("\\}", "");

        for(String eachInterOperation : allInterOperations){
            if(eachWordExpanded.contains(eachInterOperation)){
                if(eachInterOperation.contains("words")){
                    eachWordExpanded = eachWordExpanded.replaceAll(eachInterOperation, "");
                    eachInterOperation = eachInterOperation.replaceAll("words", "digits");
                    eachWordExpanded = eachWordExpanded.replaceAll(eachInterOperation, "");
                    eachWordExpanded = eachWordExpanded.replaceAll("\\[", "");
                    eachWordExpanded = eachWordExpanded.replaceAll("\\]", "");
                    eachWordExpanded = eachWordExpanded.replaceAll("[(0-9)+]", "");
                    eachWordExpanded = eachWordExpanded.replaceAll("(\\s)+", " ");
                }else{
                    eachWordExpanded = eachWordExpanded.replaceAll(eachInterOperation, "");
                    eachInterOperation = eachInterOperation.replaceAll("digits", "words");
                    eachWordExpanded = eachWordExpanded.replaceAll(eachInterOperation, "");
                    eachWordExpanded = eachWordExpanded.replaceAll("\\[", "");
                    eachWordExpanded = eachWordExpanded.replaceAll("\\]", "");
                    eachWordExpanded = eachWordExpanded.replaceAll("[(A-Za-z)+]", "");
                    eachWordExpanded = eachWordExpanded.replaceAll("\\s", "");
                }
            }else{
                continue;
            }
        }
        return eachWordExpanded;
    }).collect(Collectors.joining("|"));

    System.out.println(result);
}

使用不同的互操作字符串对上述方法运行以下测试，例如:

{<number_type_2 words> <number_type_1 words> <number_type_0 words>}
{<number_type_2 digits> <number_type_1 words> <number_type_0 words>}
{<number_type_2 digits> <number_type_1 digits> <number_type_0 digits>}
{<number_type_2 words> <number_type_1 digits> <number_type_0 digits>}

将产生类似(Java-7结果)的结果:

test|is|thirty four |test|three |one |
test|is|thirty four |test|three |1|
test|is|34|test|3|1|
test|is|34|test|3|one |

(Java-8 结果)

test|is|thirty four|test|three|one
test|is|thirty four|test|three|1
test|is|34|test|3|1
test|is|34|test|3|one

我希望这就是您想要实现的目标。

关于java - 将字符串分组到数组中，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/42196533/

java - 将字符串分组到数组中

上一篇：java - 在将其存储在数组列表中的文件中查找一个单词并确保该单词不会被多次占用？

下一篇：java - Java 中的单个字符不能使用格式左对齐吗？