我正在尝试创建一个长数据框,其值是从查找数据框创建的
df_lookup = data.frame(id = c(1,2,3), one = c(10,9,7), two = c(0,1,2), three = c(0,0,1))
df_lookup
#> id one two three
#> 1 1 10 0 0
#> 2 2 9 1 0
#> 3 3 7 2 1
我正在寻找的输出是一个包含 30 行的数据框,其中前 10 行都是 1,接下来 10 行的值为 9 个 1 和 1 个 2,最后 10 行的值为 7 个 1,两个 2 和一个 3。
df_output
id bin
1: 1 1
2: 1 1
3: 1 1
4: 1 1
5: 1 1
6: 1 1
7: 1 1
8: 1 1
9: 1 1
10: 1 1
11: 2 1
12: 2 1
13: 2 1
14: 2 1
15: 2 1
16: 2 1
17: 2 1
18: 2 1
19: 2 1
20: 2 2
21: 3 1
22: 3 1
23: 3 1
24: 3 1
25: 3 1
26: 3 1
27: 3 1
28: 3 2
29: 3 2
30: 3 3
基于一些类似问题的在线搜索,例如 here
我能够想出以下代码
df_lookup = data.frame(id = c(1,2,3), one = c(10,9,7), two = c(0,1,2), three = c(0,0,1))
col_names = c("one","two","three")
setDT(df_lookup)
df_output = data.frame()
for (j in 1:length(col_names)){
temp_df = df_lookup[, .(rep(j, get(as.character(col_names[j])))),.(id)]
df_output = rbind(df_output,temp_df)
}
names(df_output) = c("id","bin")
df_output = df_output[order(df_output$id,df_output$bin),]
虽然这解决了目的,但当我需要循环遍历许多“id”或许多“df_lookup”表时,可能需要一些时间才能运行。
所以想检查是否有任何最佳/更快的方法来实现“df_output”
最佳答案
使用 melt()
和 rep()
的 data.table
解决方案
library(data.table)
df_lookup = data.frame(id = c(1,2,3),
one = c(10,9,7),
two = c(0,1,2),
three = c(0,0,1))
dt <- data.table::as.data.table(df_lookup)
# into long format
dt_melt <- melt(dt, id.vars = "id")
dt_melt
#> id variable value
#> 1: 1 one 10
#> 2: 2 one 9
#> 3: 3 one 7
#> 4: 1 two 0
#> 5: 2 two 1
#> 6: 3 two 2
#> 7: 1 three 0
#> 8: 2 three 0
#> 9: 3 three 1
dt_exploded <- dt_melt[, rep(variable, value), by = id]
dt_exploded[, bin := data.table::fcase(V1 == "one", 1,
V1 == "two", 2,
V1 == "three", 3)][]
#> id V1 bin
#> 1: 1 one 1
#> 2: 1 one 1
#> 3: 1 one 1
#> 4: 1 one 1
#> 5: 1 one 1
#> 6: 1 one 1
#> 7: 1 one 1
#> 8: 1 one 1
#> 9: 1 one 1
#> 10: 1 one 1
#> 11: 2 one 1
#> 12: 2 one 1
#> 13: 2 one 1
#> 14: 2 one 1
#> 15: 2 one 1
#> 16: 2 one 1
#> 17: 2 one 1
#> 18: 2 one 1
#> 19: 2 one 1
#> 20: 2 two 2
#> 21: 3 one 1
#> 22: 3 one 1
#> 23: 3 one 1
#> 24: 3 one 1
#> 25: 3 one 1
#> 26: 3 one 1
#> 27: 3 one 1
#> 28: 3 two 2
#> 29: 3 two 2
#> 30: 3 three 3
#> id V1 bin
您可以随时将 one
映射到 1
关于R 根据数据帧中的值重复值,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/73413403/