java - 从java中的另一个字符串中删除字符串

标签 java string

假设我有这个单词列表:

 String[] stopWords = new String[]{"i","a","and","about","an","are","as","at","be","by","com","for","from","how","in","is","it","not","of","on","or","that","the","this","to","was","what","when","where","who","will","with","the","www"};

比我有文字

 String text = "I would like to do a nice novel about nature AND people"

是否有匹配停止词并在忽略大小写的情况下删除它们的方法?像这样的地方吗?:

 String noStopWordsText = remove(text, stopWords);

结果:

 " would like do nice novel nature people"

如果您知道正则表达式,那会很好用,但我真的更喜欢像通用解决方案这样更注重性能的解决方案。

顺便说一句,现在我正在使用这种缺乏适当的不区分大小写处理的公共(public)方法:

 private static final String[] stopWords = new String[]{"i", "a", "and", "about", "an", "are", "as", "at", "be", "by", "com", "for", "from", "how", "in", "is", "it", "not", "of", "on", "or", "that", "the", "this", "to", "was", "what", "when", "where", "who", "will", "with", "the", "www", "I", "A", "AND", "ABOUT", "AN", "ARE", "AS", "AT", "BE", "BY", "COM", "FOR", "FROM", "HOW", "IN", "IS", "IT", "NOT", "OF", "ON", "OR", "THAT", "THE", "THIS", "TO", "WAS", "WHAT", "WHEN", "WHERE", "WHO", "WILL", "WITH", "THE", "WWW"};
 private static final String[] blanksForStopWords = new String[]{"", "", "", "", "", "", "", "", "", "", "", "", "", "", "", "", "", "", "", "", "", "", "", "", "", "", "", "", "", "", "", "", "", "", "", "", "", "", "", "", "", "", "", "", "", "", "", "", "", "", "", "", "", "", "", "", "", "", "", "", "", "", "", "", "", "", "", ""};

 noStopWordsText = StringUtils.replaceEach(text, stopWords, blanksForStopWords);     

最佳答案

用停用词创建一个正则表达式,使其不区分大小写,然后使用匹配器的 replaceAll 方法将所有匹配项替换为空字符串

import java.util.regex.*;

Pattern stopWords = Pattern.compile("\\b(?:i|a|and|about|an|are|...)\\b\\s*", Pattern.CASE_INSENSITIVE);
Matcher matcher = stopWords.matcher("I would like to do a nice novel about nature AND people");
String clean = matcher.replaceAll("");

pattern中的...是我偷懒,继续停用词列表。

另一种方法是遍历所有停用词并使用StringreplaceAll 方法。该方法的问题是 replaceAll 将为每次调用编译一个新的正则表达式,因此在循环中使用效率不高。此外,当您使用 StringreplaceAll 时,您不能传递使正则表达式不区分大小写的标志。

编辑:我在模式周围添加了 \b 以使其只匹配整个单词。我还添加了 \s* 以使其包含所有空格,这可能不是必需的。

关于java - 从java中的另一个字符串中删除字符串,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/4769282/

相关文章:

java - Java Swing 的文本 API 是否适合显示自动生成的文本?

c# - 为我的自定义对象提供 ToString()

java - 将客户端 REQUEST_ENTITY_PROCESSING 设置为 CHUNKED 我丢失了文件

java - 如何使用 Java 8 库将 UTC DateTime 转换为另一个时区?

java - Spring Boot从服务类调用方法的问题

java - 为什么 Java 的 Date.after() 在日期实际上更早时返回 'true'?

java - 当我尝试打印此内容时,为什么新行会附加到我的数组元素中?

java - 在java字符串中用\u替换\\u

格式化字符串中的 C# 静态和非静态字符串变量作为参数

regex - 使用 Sed 和正则表达式替换字符串