R 根据数据帧中的值重复值

我正在尝试创建一个长数据框，其值是从查找数据框创建的

df_lookup = data.frame(id = c(1,2,3), one = c(10,9,7), two = c(0,1,2), three = c(0,0,1))

df_lookup
#>   id one two three
#> 1  1  10   0     0
#> 2  2   9   1     0
#> 3  3   7   2     1

我正在寻找的输出是一个包含 30 行的数据框，其中前 10 行都是 1，接下来 10 行的值为 9 个 1 和 1 个 2，最后 10 行的值为 7 个 1，两个 2 和一个 3。

基于一些类似问题的在线搜索，例如 here

我能够想出以下代码

df_lookup = data.frame(id = c(1,2,3), one = c(10,9,7), two = c(0,1,2), three = c(0,0,1))

col_names = c("one","two","three")
setDT(df_lookup)

df_output = data.frame()

for (j in 1:length(col_names)){
  temp_df = df_lookup[, .(rep(j, get(as.character(col_names[j])))),.(id)] 
  df_output = rbind(df_output,temp_df) 
}

names(df_output) = c("id","bin")

df_output = df_output[order(df_output$id,df_output$bin),]

虽然这解决了目的，但当我需要循环遍历许多“id”或许多“df_lookup”表时，可能需要一些时间才能运行。

所以想检查是否有任何最佳/更快的方法来实现“df_output”

最佳答案

使用 melt() 和 rep() 的 data.table 解决方案

library(data.table)

df_lookup = data.frame(id = c(1,2,3),
                       one = c(10,9,7),
                       two = c(0,1,2), 
                       three = c(0,0,1))

dt <- data.table::as.data.table(df_lookup)

# into long format
dt_melt <- melt(dt, id.vars = "id")
dt_melt
#>    id variable value
#> 1:  1      one    10
#> 2:  2      one     9
#> 3:  3      one     7
#> 4:  1      two     0
#> 5:  2      two     1
#> 6:  3      two     2
#> 7:  1    three     0
#> 8:  2    three     0
#> 9:  3    three     1
dt_exploded <- dt_melt[, rep(variable, value), by = id]
dt_exploded[, bin := data.table::fcase(V1 == "one", 1,
                                       V1 == "two", 2,
                                       V1 == "three", 3)][]
#>     id    V1 bin
#>  1:  1   one   1
#>  2:  1   one   1
#>  3:  1   one   1
#>  4:  1   one   1
#>  5:  1   one   1
#>  6:  1   one   1
#>  7:  1   one   1
#>  8:  1   one   1
#>  9:  1   one   1
#> 10:  1   one   1
#> 11:  2   one   1
#> 12:  2   one   1
#> 13:  2   one   1
#> 14:  2   one   1
#> 15:  2   one   1
#> 16:  2   one   1
#> 17:  2   one   1
#> 18:  2   one   1
#> 19:  2   one   1
#> 20:  2   two   2
#> 21:  3   one   1
#> 22:  3   one   1
#> 23:  3   one   1
#> 24:  3   one   1
#> 25:  3   one   1
#> 26:  3   one   1
#> 27:  3   one   1
#> 28:  3   two   2
#> 29:  3   two   2
#> 30:  3 three   3
#>     id    V1 bin

您可以随时将 one 映射到 1

关于R 根据数据帧中的值重复值，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/73413403/

R 根据数据帧中的值重复值

上一篇：kotlin - Kotlin 中不保存一对多/多对一关系，而 Java 中则保存

下一篇：regex - 如何替换正则表达式中的特定字符？