作为我的问题的一个简化示例,假设我有四个 data.tables
dt1
, ..., dt4
,都带有相同的结构:
head(dt1)
date x y
1: 2000-10-01 0.4527087 -0.11590788
2: 2001-10-01 0.7200252 -0.55722270
3: 2002-10-01 -1.3804472 -1.47030087
4: 2003-10-01 -0.1380225 2.34157766
5: 2004-10-01 -0.9288675 -1.32993998
6: 2005-10-01 -0.9592633 0.76316150
也就是说,它们都有名为date
、x
和y
的三列。我想要的输出是一个合并的 data.table
(日期),包含五列:date
,然后每个单独表中的 x
列重命名反射(reflect)其原始 data.table
:
head(desired_output)
date x_dt1 x_dt2 x_dt3 x_dt4
1: 2000-10-01 0.4527087 -0.11590788 1.1581946 -1.5159040
2: 2001-10-01 0.7200252 -0.55722270 -1.6247254 -0.3325556
3: 2002-10-01 -1.3804472 -1.47030087 -0.9766309 -0.2368857
4: 2003-10-01 -0.1380225 2.34157766 1.1831091 -0.4399184
5: 2004-10-01 -0.9288675 -1.32993998 0.8716144 -0.4086229
6: 2005-10-01 -0.9592633 0.76316150 -0.8860816 -0.4299365
我假设这可以通过使用 merge.data.table
的 suffixes
参数以某种方式完成。我尝试从 this answer 修改 mergeDTs
,但没有成功。成功修改 mergeDTs
的解决方案(或仅使用可应用于多个 data.tables
的列表的函数)将是非常好的。
我知道 this very slick dplyr/purrr answer 但更喜欢 data.table
解决方案。
示例数据
library(data.table)
dt1 <- data.table(date = seq(from = as.Date("2000-10-01"), to = as.Date("2010-10-01"), by = "years"),
x = rnorm(11),
y = rnorm(11))
dt2 <- data.table(date = seq(from = as.Date("2000-10-01"), to = as.Date("2010-10-01"), by = "years"),
x = rnorm(11),
y = rnorm(11))
dt3 <- data.table(date = seq(from = as.Date("2000-10-01"), to = as.Date("2010-10-01"), by = "years"),
x = rnorm(11),
y = rnorm(11))
dt4 <- data.table(date = seq(from = as.Date("2000-10-01"), to = as.Date("2010-10-01"), by = "years"),
x = rnorm(11),
y = rnorm(11))
解决方案
下面我将 B. Christian Kamgang 的答案转化为函数形式(使其更容易适应我的实际问题)并删除了对新管道的依赖(因为我的组织尚未升级):
merge_select <- function(on, vars, ..., suffix = "_") {
dts <- list(...)
names(dts) <- sapply(as.list(substitute(list(...)))[-1L], deparse)
nv <- length(vars)
ndt <- length(dts)
old_cols <- split(rep(vars, ndt),
ceiling(seq_along(rep(vars, ndt))/nv))
new_cols <- split(paste0(vars, suffix, rep(names(dts), each = nv)),
ceiling(seq_along(paste0(vars,
suffix,
rep(names(dts), each = nv)))/nv))
sep_cols <- lapply(dts, function(x) subset(x, select = c(on, vars)))
Reduce(f = function(x,y) merge(x, y, by = on),
Map(f = setnames, sep_cols, old_cols, new_cols))
}
在我的情况下转化为:
merge_select("date", "x", dt1, dt2, dt3, dt4)
date x_dt1 x_dt2 x_dt3 x_dt4
1: 2000-10-01 -0.6365707 0.11804268 -0.01084163 -0.88127011
2: 2001-10-01 -0.2533127 -3.16924568 0.45746415 0.69742537
3: 2002-10-01 2.3069266 -0.82670409 -0.54236745 -1.49613384
4: 2003-10-01 0.7075547 -0.91809007 -0.67888707 -0.26106146
5: 2004-10-01 -0.7165651 -0.45711888 -0.83903416 1.45113260
6: 2005-10-01 0.5703561 0.24587897 0.13862020 0.33928202
7: 2006-10-01 -0.6258097 -0.77652389 -0.49252474 -0.80460241
8: 2007-10-01 -0.4600565 0.55612959 0.86749410 -1.30850411
9: 2008-10-01 -0.8841649 -0.48113848 -1.55858406 0.83076846
10: 2009-10-01 -0.6262272 -0.73618265 0.13350581 0.06640803
11: 2010-10-01 0.1406454 0.08994779 1.28450204 -1.18329081
此解决方案也适用于多个变量,例如。
merge_select("date", c("x","y"), dt1, dt2, dt3, dt4)
最佳答案
这是一种可能的方法,它在一个简单的 for
循环中累积合并结果:
library(data.table)
dt <- dt1[, .(date, x)]
for(i in 2:4) {
dt <- merge(dt, get(paste0("dt", i))[, .(date, x)], by = "date", suffixes = c("", paste0("_dt", i)))
}
setnames(dt, old = "x", new = "x_dt1")
head(dt)
#> date x_dt1 x_dt2 x_dt3 x_dt4
#> 1: 2000-10-01 -1.5035218 2.0463775 -0.120544283 -0.5662290
#> 2: 2001-10-01 0.5977386 -0.1968421 -0.840102174 1.2412272
#> 3: 2002-10-01 -0.9100557 -0.1687148 -1.738526471 1.3685767
#> 4: 2003-10-01 0.7027232 0.9009135 -0.247273205 1.3135718
#> 5: 2004-10-01 0.5269265 0.6176381 -0.007662592 -0.2928206
#> 6: 2005-10-01 -0.8350406 -0.7343245 -0.643701996 2.3068948
或者,使用 Reduce()
累积合并结果:
Reduce(
f = function(dt, dti) merge(dt, get(dti)[, .(date, x)], by = "date", suffixes = c("", paste0("_", dti))),
x = paste0("dt", 1:4),
init = dt1[, .(date, x)]
)[, x := NULL][]
注意:为了摆脱 get()
调用,我们可以在合并之前收集列表中的所有 data.tables 或编写一个小的函数包装器,例如
merge_dts <- function(...) {
dts <- list(...)
dt <- dts[[1]][, .(date, x)]
for(i in seq_along(dts)[-1]) {
dt <- merge(dt, dts[[i]][, .(date, x)], by = "date", suffixes = c("", paste0("_dt", i)))
}
setnames(dt, old = "x", new = "x_dt1")
return(dt)
}
merge_dts(dt1, dt2, dt3, dt4)
关于r - 合并多个 data.tables 并重命名列以反射(reflect)来源,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/73942883/