r - 如何在 R 中自动向大数据框中的变量添加因子

标签 r function label factors pmap

我在 R 中有一个大型数据框,其中包含 200 多个主要是字符变量,我想为其添加因子。我已在单独的数据框中准备了所有级别和标签。对于某个变量Var1,对应的级别和标签为Var1_vVar1_b,例如变量Gender级别和标签名为 Gender_vGender_l

这是我的数据示例:

df <- data.frame (Gender = c("2","2","1","2"),
                  AgeG = c("3","1","4","2"))

fct <- data.frame (Gender_v  = c("1", "2"),
                  Gender_b = c("Male", "Female"),
                  AgeG_v = c("1","2","3","4"),
                  AgeG_b = c("<25","25-60","65-80",">80"))

df$Gender <- factor(df$Gender, levels = fct$Gender_v, labels = fct$Gender_b, exclude = NULL)
df$AgeG <- factor(df$AgeG, levels = fct$AgeG_v, labels = fct$AgeG_b, exclude = NULL)

是否有办法使该过程自动化,以便将因素(级别和标签)应用于相应的变量,而无需我单独执行每个单独的操作? 我认为这是通过 pmap 的函数probbly 完成的。

我的目标是最大限度地减少此过程所需的工作量。还有更好的方法来准备标签和级别吗?

非常感谢您的帮助。

最佳答案

我通过简单地重构代码来解决这个问题,自动化思维循环。添加的数据越多,您花的时间就越多。我相信这个 fct[[paste0(names(df[i]),"_v")]] 可以在一个小函数中重构,看起来更好

> df <- data.frame (Gender = c("2","2","1","2"),
+                   AgeG = c("3","1","4","2"))
> 
> fct <- data.frame (Gender_v  = c("1", "2"),
+                    Gender_b = c("Male", "Female"),
+                    AgeG_v = c("1","2","3","4"),
+                    AgeG_b = c("<25","25-60","65-80",">80"))
> 
> for(i in 1:ncol(df)){
+   
+   le <- fct[[paste0(names(df[i]),"_v")]]
+   
+   la <- fct[[paste0(names(df[i]),"_b")]]
+   
+   df[,i] <- factor(df[,i],levels = le ,labels = la,exclude = NULL)
+   
+ }
> 
> df
  Gender  AgeG
1 Female 65-80
2 Female   <25
3   Male   >80
4 Female 25-60
>

编辑:这是添加的 if 条件


> df <- data.frame (Gender_f = c("2","2","1","2"),
+                             AgeG_f = c("3","1","4","2"),
+                   AgeN = c(70,15,96,30))
> 
> fct <- data.frame (Gender_v  = c("1", "2"),
+                                   Gender_b = c("Male", "Female"),
+                                   AgeG_v = c("1","2","3","4"),
+                                  AgeG_b = c("<25","25-60","65-80",">80"))
> 
> for(i in 1:ncol(df)){
+ 
+   if(endsWith(names(df[i]),"_f")){
+     
+     name <- str_remove(names(df[i]),"_f")
+   
+     le <- fct[[paste0(name,"_v")]]
+    
+     la <- fct[[paste0(name,"_b")]]
+      
+     df[,i] <- factor(df[,i],levels = le ,labels = la,exclude = NULL)
+   
+   }
+      
+ }
> 
> df
  Gender_f AgeG_f AgeN
1   Female  65-80   70
2   Female    <25   15
3     Male    >80   96
4   Female  25-60   30
> 

关于r - 如何在 R 中自动向大数据框中的变量添加因子,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/70793773/

相关文章:

php - 设置 PHP 函数超时

mysql - Symfony - 代币

java - Selenium - 无法找到标签内的文本

r - 使用 ggplot2 进行元编程

r - 在 ggplot 中绘制时间序列,线条按年份分组

r - 精确存储大整数

php - 从 AJAX 函数内修改外部变量?

r - 访问数据表中的列表成员

python - 如何在最小值和最大值之间对 pandas 数据框进行分类/标记

python - 当轴限制发生变化时,很好地标记没有图例的线条?