我有几个数据集,其列数和/或名称可能不同(或不变)。我想生成一个具有统一列名称的单个数据集。
让我们看下面的例子:
df1 <- tibble::tribble(
~v1a, ~v2, ~v3, ~v4, ~v5,
"A", 4, "Z1", "a1", "ti",
"B", 3, "Y2", "b2", "tu",
"C", 2, "X3", "c3", "to",
"D", 1, "W4", "d4", "ta"
)
df2 <- tibble::tribble(
~v1a, ~v2, ~v3, ~v4,
"D", 1, "W4", "d4",
"C", 2, "X3", "c3",
"B", 3, "Y2", "b2",
"A", 4, "Z1", "a1"
)
df3 <- tibble::tribble(
~V1, ~V2, ~V4,
"A", 4, "a1",
"B", 3, "b2",
"C", 2, "c3",
"D", 1, "d4"
)
df4 <- tibble::tribble(
~V1a, ~V2a, ~V3a, ~V4a,
"A", 4, "Z1", "a1",
"B", 3, "Y2", "b2",
"C", 2, "X3", "c3",
"D", 1, "W4", "d4"
)
如果我执行 bind_rows(df1, df2, df3, df4)
,我会得到一个包含 12 个变量的数据集,尽管我想要一个只有 5 个变量的数据集,如下所示:
expected_df <- tibble::tribble(
~var1, ~var2, ~var3, ~var4, ~var5,
"A", 4L, "Z1", "a1", "ti",
"B", 3L, "Y2", "b2", "tu",
"C", 2L, "X3", "c3", "to",
"D", 1L, "W4", "d4", "ta",
"D", 1L, "W4", "d4", NA,
"C", 2L, "X3", "c3", NA,
"B", 3L, "Y2", "b2", NA,
"A", 4L, "Z1", "a1", NA,
"A", 4L, NA, "a1", NA,
"B", 3L, NA, "b2", NA,
"C", 2L, NA, "c3", NA,
"D", 1L, NA, "d4", NA,
"A", 4L, "Z1", "a1", NA,
"B", 3L, "Y2", "b2", NA,
"C", 2L, "X3", "c3", NA,
"D", 1L, "W4", "d4", NA
)
我怎样才能实现这个目标?
我认为一个潜在的解决方案开始是创建一种具有“旧”和"new"列名称的对应表:
col_names <- tibble::tribble(
~old, ~new,
"v1a", "var1",
"v2", "var2",
"v3", "var3",
"v4", "var4",
"v5", "var5",
"V1", "var1",
"V2", "var2",
"V4", "var4",
"V1a", "var1",
"V2a", "var2",
"V3a", "var3",
"V4a", "var4"
)
然后有条件地重命名各种数据集的列名称,但我不知道。如何做到这一点...你有什么想法吗?
非常感谢!
最佳答案
获取列表中的数据,使用 match
更改列名称,使用 map_df
将它们合并到一个数据帧中。
library(dplyr)
library(purrr)
map_df(mget(paste0('df', 1:4)),
~.x %>% rename_with(~col_names$new[match(., col_names$old)]))
# var1 var2 var3 var4 var5
# <chr> <dbl> <chr> <chr> <chr>
# 1 A 4 Z1 a1 ti
# 2 B 3 Y2 b2 tu
# 3 C 2 X3 c3 to
# 4 D 1 W4 d4 ta
# 5 D 1 W4 d4 NA
# 6 C 2 X3 c3 NA
# 7 B 3 Y2 b2 NA
# 8 A 4 Z1 a1 NA
# 9 A 4 NA a1 NA
#10 B 3 NA b2 NA
#11 C 2 NA c3 NA
#12 D 1 NA d4 NA
#13 A 4 Z1 a1 NA
#14 B 3 Y2 b2 NA
#15 C 2 X3 c3 NA
#16 D 1 W4 d4 NA
关于r - 统一跨数据集的列名称,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/68409114/