java - 使用也处理撇号的正则表达式匹配单词

我必须将一行文本分成单词，并且对使用什么正则表达式感到困惑。我到处寻找与单词匹配的正则表达式，并找到了与这篇文章类似但希望在 java 中使用的正则表达式(java 不处理常规字符串中的\)。

Regex to match words and those with an apostrophe

我已经为每个答案尝试了正则表达式，但不确定如何为此构建 Java 的正则表达式(我假设所有正则表达式都是相同的)。如果在我看到的正则表达式中用\替换\，则正则表达式不起作用。

我也尝试过自己查找并来到这个页面: http://www.regular-expressions.info/reference.html

但我无法全神贯注于正则表达式高级技术。

我正在使用 String.split(此处为正则表达式字符串)来分隔我的字符串。一个例子是如果我得到以下内容: “我喜欢吃，但我不喜欢吃大家的饭，不然他们会饿死的。” 我要匹配:

I
like
to
eat
but
I
don't
like
to
eat
everyone's
food
or
they'll
starve

我也不想匹配 '' 或 '''' 或 ' ' 或 '.'' 或其他排列。我的分隔符条件应类似于: [匹配任何单词字符][如果前面有单词字符，则也匹配撇号，如果有的话，则匹配其后的单词字符]

我得到的只是一个匹配单词 [\w] 的简单正则表达式，但我不确定如何使用前瞻或后视来匹配撇号然后匹配其余单词。

最佳答案

使用我评论中所述页面上 WhirlWind 的答案，您可以执行以下操作:

String candidate = "I \n"+
    "like \n"+
    "to "+
    "eat "+
    "but "+
    "I "+
    "don't "+
    "like "+
    "to "+
    "eat "+
    "everyone's "+
    "food "+
    "''  ''''  '.' ' "+
    "or "+
    "they'll "+
    "starv'e'";

String regex = "('\\w+)|(\\w+'\\w+)|(\\w+')|(\\w+)";
Matcher matcher = Pattern.compile(regex).matcher(candidate);
while (matcher.find()) {
  System.out.println("> matched: `" + matcher.group() + "`");
}

它将打印:

> matched: `I`
> matched: `like`
> matched: `to`
> matched: `eat`
> matched: `but`
> matched: `I`
> matched: `don't`
> matched: `like`
> matched: `to`
> matched: `eat`
> matched: `everyone's`
> matched: `food`
> matched: `or`
> matched: `they'll`
> matched: `starv'e`

您可以在此处找到运行示例:http://ideone.com/pVOmSK

关于java - 使用也处理撇号的正则表达式匹配单词，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/13632679/

java - 使用也处理撇号的正则表达式匹配单词

上一篇：java - 从大文件(超过 700MB)中提取模式的更有效方法是什么

下一篇：maven-2 - Windows 7 64 位上的 Maven 2.2.1