识别向量中的给定模式并添加缺少的元素以获得给定模式的重复

标签 r pattern-matching sequence

这个问题与此Wide a dataframe and insert missing columns相关

假设我们有一个给定 5 个元素的模式,按以下顺序:“A”、“B”、“C”、“D”、“E”

这种模式会重复 10 次。但有时缺少一些元素(参见图片我的矢量(橙色)。

R中是否可以识别重复的模式并填充缺少的元素(参见图片我想要的输出)。

我的向量:

my.vector <- c("A", "B", "C", "D", "E", "A", "B", "C", "D", "E", "B", "C", 
               "D", "E", "B", "C", "D", "E", "B", "C", "D", "E", "B", "C", "D", 
               "E", "B", "C", "D", "E", "B", "C", "D", "E", "A", "B", "C", "D", 
               "E", "B")

my.vector
 [1] "A" "B" "C" "D" "E" "A" "B" "C" "D" "E" "B" "C" "D" "E" "B" "C" "D" "E" "B" "C" "D" "E" "B" "C" "D" "E" "B" "C" "D" "E" "B" "C" "D" "E" "A" "B" "C" "D" "E" "B"

图形解释:

给定模式: enter image description here

我的向量: enter image description here

我想要的输出:要添加的红色标记元素 enter image description here

最佳答案

根据 diff 创建分组列的match ing 索引 LETTERS[1:5] , split (或使用任何分组函数,如tapply等),并创建一个union与“字母[1:5] , 取消列表 the列表and取消名称`

unname( unlist(lapply(split(my.vector, cumsum(c(TRUE, 
     diff(match(my.vector, LETTERS[1:5])) != 1))),
       function(x) union(LETTERS[1:5], x))))

-输出

[1] "A" "B" "C" "D" "E" "A" "B" "C" "D" "E" "A" "B" "C" "D" "E" "A" "B" "C" "D" "E" "A" "B" "C" "D" "E" "A" "B" "C" "D" "E" "A" "B" "C" "D" "E" "A"
[37] "B" "C" "D" "E" "A" "B" "C" "D" "E" "A" "B" "C" "D" "E"

或者另一个选项是 complete

library(dplyr)
library(tidyr)
library(data.table)
tibble(col1 = my.vector) %>%
    group_by(rn = rowid(col1)) %>%
    complete(col1 = LETTERS[1:5]) %>%
    ungroup %>%
    pull(col1)

-输出

1] "A" "B" "C" "D" "E" "A" "B" "C" "D" "E" "A" "B" "C" "D" "E" "A" "B" "C" "D" "E" "A" "B" "C" "D" "E" "A" "B" "C" "D" "E" "A" "B" "C" "D" "E" "A"
[37] "B" "C" "D" "E" "A" "B" "C" "D" "E" "A" "B" "C" "D" "E"

关于识别向量中的给定模式并添加缺少的元素以获得给定模式的重复,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/68885775/

相关文章:

r - R plot函数中不需要的默认粗轴线

R 分号将列分隔为行

angular - 在 ngrx effect angular 中做某事之前等待两个序列 Action

python - Python中的排列组合

r - 顺序日期的格式(带有后缀-st,-nd,-rd,-th的月份中的日期)

regex - 捕获正则表达式的更快方法

pattern-matching - 在 Coq 的单个子句中模式匹配多个构造函数

java - 使用模式匹配器提取html

c++ - 哪个是更好的字符串搜索算法? Boyer-Moore 还是 Boyer Moore Horspool?

C 程序 - 在用户输入时需要检查字符串的一部分