java - 如何在 Java 中截断 n 个单词后的字符串?

标签 java string

是否有库具有在 n 个单词后截断字符串的例程?我正在寻找可以转动的东西:

truncateAfterWords(3, "hello, this\nis a long sentence");

进入

"hello, this\nis"

我可以自己写,但我认为像这样的东西可能已经存在于一些开源字符串操作库中。


这是我希望任何解决方案都能通过的测试用例的完整列表:

import java.util.regex.*;

public class Test {

    private static final TestCase[] TEST_CASES = new TestCase[]{
        new TestCase(5, null, null),
        new TestCase(5, "", ""),
        new TestCase(5, "single", "single"),
        new TestCase(1, "single", "single"),
        new TestCase(0, "single", ""),
        new TestCase(2, "two words", "two words"),
        new TestCase(1, "two words", "two"),
        new TestCase(0, "two words", ""),
        new TestCase(2, "line\nbreak", "line\nbreak"),
        new TestCase(1, "line\nbreak", "line"),
        new TestCase(2, "multiple  spaces", "multiple  spaces"),
        new TestCase(1, "multiple  spaces", "multiple"),
        new TestCase(3, " starts with space", " starts with space"),
        new TestCase(2, " starts with space", " starts with"),
        new TestCase(10, "A full sentence, with puncutation.", "A full sentence, with puncutation."),
        new TestCase(4, "A full sentence, with puncutation.", "A full sentence, with"),
        new TestCase(50, "Testing a very long number of words in the testcase to see if the solution performs well in such a situation.  Some solutions don't do well with lots of input.", "Testing a very long number of words in the testcase to see if the solution performs well in such a situation.  Some solutions don't do well with lots of input."),
    };

    public static void main(String[] args){
        for (TestCase t: TEST_CASES){
            try {
                String r = truncateAfterWords(t.n, t.s);
                if (!t.equals(r)){
                    System.out.println(t.toString(r));
                }
            } catch (Exception x){
                System.out.println(t.toString(x));
            }       
        }   
    }

    public static String truncateAfterWords(int n, String s) {
        // TODO: implementation
        return null;
    }
}


class TestCase {
    public int n;
    public String s;
    public String e;

    public TestCase(int n, String s, String e){
        this.n=n;
        this.s=s;
        this.e=e;
    }

    public String toString(){
        return "truncateAfterWords(" + n + ", " + toJavaString(s) + ")\n  expected: " + toJavaString(e);
    }

    public String toString(String r){
        return this + "\n  actual:   " + toJavaString(r) + "";
    }

    public String toString(Exception x){
        return this + "\n  exception: " + x.getMessage();
    }    

    public boolean equals(String r){
        if (e == null && r == null) return true;
        if (e == null) return false;
        return e.equals(r);
    }   

    public static final String escape(String s){
        if (s == null) return null;
        s = s.replaceAll("\\\\","\\\\\\\\");
        s = s.replaceAll("\n","\\\\n");
        s = s.replaceAll("\r","\\\\r");
        s = s.replaceAll("\"","\\\\\"");
        return s;
    }

    private static String toJavaString(String s){
        if (s == null) return "null";
        return " \"" + escape(s) + "\"";
    }
}

此站点上有其他语言的解决方案:

最佳答案

您可以使用一个简单的基于正则表达式的解决方案:

private String truncateAfterWords(int n, String str) {
   return str.replaceAll("^((?:\\W*\\w+){" + n + "}).*$", "$1");    
}

现场演示:http://ideone.com/Nsojc7

更新:根据您的意见解决性能问题:

在处理大量单词时使用以下方法以获得更快的性能:

private final static Pattern WB_PATTERN = Pattern.compile("(?<=\\w)\\b");

private String truncateAfterWords(int n, String s) {
   if (s == null) return null;
   if (n <= 0) return "";
   Matcher m = WB_PATTERN.matcher(s);
   for (int i=0; i<n && m.find(); i++);
   if (m.hitEnd())
      return s;
   else
      return s.substring(0, m.end());
}

关于java - 如何在 Java 中截断 n 个单词后的字符串?,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/15955384/

相关文章:

C++:如何将字符串拆分为大小均匀的较小字符串?

java - 替换 byte[] 中的字符串

javascript - stuff.className.indexOf ("term") == 1 不工作

java - War文件无法在Tomcat 7上部署

java - Eclipse 中的相关文件树

java - BigInteger 线程安全吗?

java - 从java中的字符串中修剪换行符

java - 在 android studio : This class should provide default constructor 中构建签名的 apk 时出错

java - swagger ui 不从通用字段中选取 apimodelproperty 注释

java - 如何匹配String中的特定字符n-m次