regex - R 使用 tidyr::separate 在最后一个空白字符处拆分字符串

假设我有一个这样的数据框:

df<-data.frame(a=c("AA","BB"),b=c("short string","this is the longer string"))

我想根据最后出现的空格使用正则表达式拆分每个字符串。
我试过:

library(dplyr)
library(tidyr)
df%>%
  separate(b,c("partA","partB"),sep=" [^ ]*$")

但这省略了输出中字符串的第二部分。我想要的输出如下所示:

   a              partA  partB
1 AA              short string
2 BB this is the longer string

我该怎么做。如果我可以为此使用 tidyr 和 dplyr 会很好。

最佳答案

我们可以使用 extract来自 tidyr通过使用捕获组 ( (...) )。我们匹配零个或多个字符( .* )并将其放在括号内( (.*) )，然后是零个或多个空格( \\s+ )，然后是下一个仅包含非空格字符的捕获组( [^ ] ) 直到字符串的结尾 ( $ )。

library(tidyr)
extract(df, b, into = c('partA', 'partB'), '(.*)\\s+([^ ]+)$')
#   a              partA  partB
#1 AA              short string
#2 BB this is the longer string

关于regex - R 使用 tidyr::separate 在最后一个空白字符处拆分字符串，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/32119963/

上一篇：react-native - React Native Webview如何使用injectJavascript

下一篇：unit-testing - 测试在 Clean 架构中做不止一件事的 Interactor 方法

相关文章：

r - 使用 R nloptr 包进行最小化 - 多重等式约束

r - 将集合操作从 R 的数据帧移植到数据表 : How to identify duplicated rows?

r - 将数据帧组合到 R 中的特定列表中

R 通过对变量进行分组，在第一次出现值时使用条件语句创建新变量

按行值重命名分组的 tibble 中的列 (dplyr)

多字节字符串的正则表达式字边界

R 正则表达式 : Match all double-quote (") characters inside square brackets

r - 使用 tidyverse 对长格式数据框中的两个变量求和

javascript - AngularJs 中数字的正则表达式

python - 在 Python 中查找字符串中以 $ 符号开头的所有单词