regex - 如何按名称提取某些正​​则表达式的最小条目?

标签 regex r

我想按名称提取一些正则表达式的最小条目。

这里是一些数据:

# Here I define the dates:
dates <- as.Date(as.character(c("2011-01-13",
                           "2011-01-14",
                           "2011-01-15",
                           "2011-01-16",
                           "2011-01-17",
                           "2011-01-13",
                           "2011-01-14",
                           "2011-01-15",
                           "2011-01-16",
                           "2011-01-17",
                           "2011-01-13",
                           "2011-01-14",
                           "2011-01-15",
                           "2011-01-16",
                           "2011-01-17")))
# Here I define the Names
Name <-c("Andy","Andy","Andy","Andy","Andy","Jo","Jo","Jo","Jo","Jo","Me","Me","Me","Me",'Me')
# Here I define the status character
status<- c("ID: 10 -> 1","ID: 11 -> 0","ID: 3 -> 5","ID: 20 -> 4","ID: 1 -> 5","ID: 1 -> 1","ID: 3 -> 2","ID: 20 -> 5","ID: 10 -> 5","ID: 11 -> 5","ID: 12 ->1","ID: 30 -> 2","ID: 30 -> 5","ID: 30 -> 2","ID: 30 -> 5")

# put together
data <- data.frame(Name, dates, status)
# Here the output with the desired column TRUE which is true for the
# first change of ID from something to 5
  Name      dates      status     condition_met
1  Andy 2011-01-13 ID: 10 -> 1     0
2  Andy 2011-01-14 ID: 11 -> 0     0
3  Andy 2011-01-15 ID: 3 -> 5      1
4  Andy 2011-01-16 ID: 20 -> 4     0
5  Andy 2011-01-17 ID: 1 -> 5      0
6    Jo 2011-01-13 ID: 1 -> 1      0
7    Jo 2011-01-14 ID: 3 -> 2      0
8    Jo 2011-01-15 ID: 20 -> 5     1
9    Jo 2011-01-16 ID: 10 -> 5     0
10   Jo 2011-01-17 ID: 11 -> 5     0
11   Me 2011-01-13 ID: 12 -> 1     0
12   Me 2011-01-14 ID: 30 -> 2     0
13   Me 2011-01-15 ID: 30 -> 5     1
14   Me 2011-01-16 ID: 30 -> 2     0
15   Me 2011-01-17 ID: 30 -> 5     0

我试着提取:

data$condition_met <- ifelse(grepl("-> 5",data$status),1,0)

它生成一个带有 condition_met 的表,但不幸的是,对于所有“-> 5”而不是最小值,也就是第一个“-> 5”。

最佳答案

我们可以创建一个函数来指示条件的第一个匹配项。然后使用 base Rdplyrdata.table 按组调用它:

condition <- function(x) as.integer(1:length(x) == grep("-> 5", x, fixed = TRUE)[1])

#base
data$condition_met <- as.integer(with(data, ave(status, Name, FUN=condition)))

#data.table
library(data.table)
setDT(data)[, condition_met := condition(status), by = Name]

#dplyr
library(dplyr)
data %>% group_by(Name) %>% mutate(condition_met = condition(status))

关于regex - 如何按名称提取某些正​​则表达式的最小条目?,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/38590617/

相关文章:

java - 在 Java 中按字面替换正则表达式中的字符

C++正则表达式的简单使用

r - 新R 3.1.3版本出现错误

r - 获取 dplyr 管道结构中最常出现的因子水平

r - 如何在 R 中没有打开连接的情况下创建连接对象?

python - python中的正则表达式-不间断地查找一组数字

_ 分隔字符串上的 JAVA 正则表达式

javascript - 如何从 JavaScript 表达式中提取键路径

r - 从 R 中的数据帧中提取复杂的子集

删除 R 中的不变列