r - 将数据文件和标签文件组合在一起，在 R 中拥有一个单一的标签数据框

我有两个数据框，一个是调 df <- tibble::tribble( ~id, ~House_member, ~dob, 1L, 4L, 1983L, 2L, 2L, 1L, 1940L, 7L, 3L, 2L, 1951L, 5L, 4L, 4L, 1965L, 2L, 5L, 3L, 1965L, 2L, 6L, 1L, 1951L, 3L, 7L, 1L, 1955L, 1L, 8L, 4L, 1982L, 2L, 9L, 2L, 1990L, 2L, 10L, 2L, 1953L, 3L, ) #sample label data label <- tibble::tribble( ~variable, ~value, "House_member", "House_member", 1L, "House_member", 2L, "House_member", 3L, "House_member", 4L, "House_member", 5L, "House_member", 6L, "House_member", 7L, "House_member", 8L, "House_member", 9L, "House_member", 10L, "dob", NA, "age_quota", NA, "age_quota", 1L, "age_quota", 2L, "age_quota", 3L, "age_quota", 4L, "age_quota", 5L, "age_quota", 6L, "age_quota", 7L, "work", NA, "work", 1L, "work", 2L, "work", 3L, "work", 4L, "work", 5L, "work", 6L, "work", 7L, "work", 8L, "sex", NA, "sex", 1L, "sex", 2L, "pss", NA, "pss", 1L, "pss", 2L, "pss", 3L, "pss", 4L, "pss", 5L, ) 我想知道有什么我希望有一种比查数据(data.csv)，另一个是标签数据(label.csv)。这是示例数据(我的原始数据大约有 150 个变量)

#sample data ~age_quota, ~work, ~sex, ~pss, 2L,     1,      1, 2L,     1,      2, 6L,     1,      1, 2L,     1,      4, 3L,     1,      1, 1L,     1,      3, 1L,     1,      3, 2L,     2,      5, 4L,     2,      3, 2L,     2,      4 ~label, NA, "How many people live with you?", "1 person", "2 persons", "3 persons", "4 persons", "5 persons", "6 persons", "7 persons", "8 persons", "9 persons", "10 or more", "date of brith", "age_quota", "10-14", "15-19", "20-29", "30-39", "40-49", "50-70", "70 +", "what is your occupation?", "full time", "part time", "retired", "student", "housewife", "unemployed", "other", "kid under 15", "gender?", "Man", "Woman", "How often do you use PS?", "Daily", "several times per week", "once per week", "several time per month", "Rarly" 方法可以将这些文件组合在一起以获得一个标记的数据框，例如 SPSS的样式格式(dbl+lbl 格式)。我知道 labelled可以向未标记的向量添加值标签的包，如下例所示:v <- labelled::labelled(c(1,2,2,2,3,9,1,3,2,NA), c(yes = 1, maybe = 2, no = 3)) 一个一个地为每个变量添加标签更好/更快的方法。



		            	
		            		
		            			最佳答案
		            		
		            	

		            	
		            		另一个 imap_dfc解决方案:
library(tidyverse)

df %>% imap_dfc(~{ 
                  label[label$variable==.y,c('label','value')] %>%
                  deframe() %>% # to named vector
                  haven::labelled(.x,.)
                 })

# A tibble: 10 x 7
          id  House_member       dob age_quota           work       sex                        pss
   <int+lbl>     <int+lbl> <int+lbl> <int+lbl>      <int+lbl> <dbl+lbl>                  <dbl+lbl>
 1         1 4 [4 persons]      1983 2 [15-19] 2 [part time]  1 [Man]   1 [Daily]                 
 2         2 1 [1 person]       1940 7 [70 +]  2 [part time]  1 [Man]   2 [several times per week]
 3         3 2 [2 persons]      1951 5 [40-49] 6 [unemployed] 1 [Man]   1 [Daily]                 
 4         4 4 [4 persons]      1965 2 [15-19] 2 [part time]  1 [Man]   4 [several time per month]
 5         5 3 [3 persons]      1965 2 [15-19] 3 [retired]    1 [Man]   1 [Daily]                 
 6         6 1 [1 person]       1951 3 [20-29] 1 [full time]  1 [Man]   3 [once per week]         
 7         7 1 [1 person]       1955 1 [10-14] 1 [full time]  1 [Man]   3 [once per week]         
 8         8 4 [4 persons]      1982 2 [15-19] 2 [part time]  2 [Woman] 5 [Rarly]                 
 9         9 2 [2 persons]      1990 2 [15-19] 4 [student]    2 [Woman] 3 [once per week]         
10        10 2 [2 persons]      1953 3 [20-29] 2 [part time]  2 [Woman] 4 [several time per month]
二手 tibble::deframe和 haven::labelled包含在 tidyverse 中
更换后速度对比filter/select通过直接访问 label :Waldi <- function() {
df %>% imap_dfc(~{ 
    label[label$variable==.y,c('label','value')] %>%
      deframe() %>% # to named vector
      haven::labelled(.x,.)})}

Waldi_old <- function() {   
    df %>% imap_dfc(~{ 
      label %>% filter(variable==.y) %>%
        select(label, value) %>%
        deframe() %>% # to named vector
        haven::labelled(.x,.)
    })}

#EDIT : Included TIC33() for-loop solution

microbenchmark::microbenchmark(TIC3(),Waldi(),Anil(),TIC1(),Waldi_old(),Sinh())
Unit: microseconds
        expr     min       lq      mean   median       uq     max neval   cld
      TIC3()   688.0   871.80   982.280   920.95  1005.55  1801.6   100 a    
     Waldi()  1345.5  1543.60  1804.758  1635.45  1893.75  4306.8   100  b   
      Anil()  4006.8  4476.65  5188.519  4862.95  5439.10 10163.6   100   c  
      TIC1()  3898.2  4278.80  5009.927  4774.95  5277.05 12916.2   100   c  
 Waldi_old() 18712.3 20091.75 21756.140 20609.35 22169.75 33359.8   100    d 
      Sinh() 22730.9 24093.45 25931.412 24946.00 26614.00 38735.3   100     e

			            

					

					
					
						关于r - 将数据文件和标签文件组合在一起，在 R 中拥有一个单一的标签数据框，我们在Stack Overflow上找到一个类似的问题：
							
								https://stackoverflow.com/questions/67504200/

r - 将数据文件和标签文件组合在一起，在 R 中拥有一个单一的标签数据框

上一篇：python - Groupby 和基于特定行值的计算

下一篇：scala - scala 中的 "implictly"现在是否已弃用或至少不再需要？