我在 R 中有一个大型数据框,其中包含 200 多个主要是字符变量,我想为其添加因子。我已在单独的数据框中准备了所有级别和标签。对于某个变量Var1
,对应的级别和标签为Var1_v
和Var1_b
,例如变量Gender
级别和标签名为 Gender_v
和 Gender_l
。
这是我的数据示例:
df <- data.frame (Gender = c("2","2","1","2"),
AgeG = c("3","1","4","2"))
fct <- data.frame (Gender_v = c("1", "2"),
Gender_b = c("Male", "Female"),
AgeG_v = c("1","2","3","4"),
AgeG_b = c("<25","25-60","65-80",">80"))
df$Gender <- factor(df$Gender, levels = fct$Gender_v, labels = fct$Gender_b, exclude = NULL)
df$AgeG <- factor(df$AgeG, levels = fct$AgeG_v, labels = fct$AgeG_b, exclude = NULL)
是否有办法使该过程自动化,以便将因素(级别和标签)应用于相应的变量,而无需我单独执行每个单独的操作?
我认为这是通过 pmap
的函数probbly 完成的。
我的目标是最大限度地减少此过程所需的工作量。还有更好的方法来准备标签和级别吗?
非常感谢您的帮助。
最佳答案
我通过简单地重构代码来解决这个问题,自动化思维循环。添加的数据越多,您花的时间就越多。我相信这个 fct[[paste0(names(df[i]),"_v")]]
可以在一个小函数中重构,看起来更好
> df <- data.frame (Gender = c("2","2","1","2"),
+ AgeG = c("3","1","4","2"))
>
> fct <- data.frame (Gender_v = c("1", "2"),
+ Gender_b = c("Male", "Female"),
+ AgeG_v = c("1","2","3","4"),
+ AgeG_b = c("<25","25-60","65-80",">80"))
>
> for(i in 1:ncol(df)){
+
+ le <- fct[[paste0(names(df[i]),"_v")]]
+
+ la <- fct[[paste0(names(df[i]),"_b")]]
+
+ df[,i] <- factor(df[,i],levels = le ,labels = la,exclude = NULL)
+
+ }
>
> df
Gender AgeG
1 Female 65-80
2 Female <25
3 Male >80
4 Female 25-60
>
编辑:这是添加的 if 条件
> df <- data.frame (Gender_f = c("2","2","1","2"),
+ AgeG_f = c("3","1","4","2"),
+ AgeN = c(70,15,96,30))
>
> fct <- data.frame (Gender_v = c("1", "2"),
+ Gender_b = c("Male", "Female"),
+ AgeG_v = c("1","2","3","4"),
+ AgeG_b = c("<25","25-60","65-80",">80"))
>
> for(i in 1:ncol(df)){
+
+ if(endsWith(names(df[i]),"_f")){
+
+ name <- str_remove(names(df[i]),"_f")
+
+ le <- fct[[paste0(name,"_v")]]
+
+ la <- fct[[paste0(name,"_b")]]
+
+ df[,i] <- factor(df[,i],levels = le ,labels = la,exclude = NULL)
+
+ }
+
+ }
>
> df
Gender_f AgeG_f AgeN
1 Female 65-80 70
2 Female <25 15
3 Male >80 96
4 Female 25-60 30
>
关于r - 如何在 R 中自动向大数据框中的变量添加因子,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/70793773/