R - 基于匹配(模板)重新排序列

标签 r list sorting

所以我有一个看起来像这样的大数据集:

     V1       V2   V3         V4
1 Sleep Domestic  Eat Child Care
2 Sleep Domestic  Eat       Paid
3 Sleep Domestic  Eat Child Care
4 Sleep      Eat Paid       <NA>

我想做的是reorder基于"template"的列
["Sleep", "Eat", "Domestic", "Paid", "Child care"] 

得到(输出)
   V1    V2       V3      V4            V5
Sleep   Eat Domestic      NA    Child Care
Sleep   Eat Domestic    Paid            NA
Sleep   Eat Domestic      NA    Child Care
Sleep   Eat       NA    Paid            NA

所以在第 1 列 Sleep ,第 2 列 Eat , ...

我不知道从哪里开始。
任何的想法 ?

数据
x = structure(list(V1 = c("Sleep", "Sleep", "Sleep", "Sleep"), V2 = c("Domestic", 
"Domestic", "Domestic", "Eat"), V3 = c("Eat", "Eat", "Eat", "Paid"
), V4 = c("Child Care", "Paid", "Child Care", NA)), .Names = c("V1", 
"V2", "V3", "V4"), row.names = c(NA, 4L), class = "data.frame")

template = c('Sleep', 'Eat', 'Domestic', 'Paid', 'Child care')

最佳答案

检查 rowSums每个template值(value),然后再次拼凑起来:

template <- c("Sleep", "Eat", "Domestic", "Paid", "Child Care")
# i've fixed this template so the case matches the values for 'Child Care'

data.frame(lapply(
  setNames(template, seq_along(template)),
  function(v) c(NA,v)[(rowSums(x==v,na.rm=TRUE)>0)+1]
))

#     X1  X2       X3   X4         X5
#1 Sleep Eat Domestic <NA> Child Care
#2 Sleep Eat Domestic Paid       <NA>
#3 Sleep Eat Domestic <NA> Child Care
#4 Sleep Eat     <NA> Paid       <NA>

或者使用 pmax 的替代方法:
data.frame(
  lapply(
    setNames(template, seq_along(template)), 
    function(v) do.call(pmax, c(replace(x, x != v,NA),na.rm=TRUE)) 
  )
)

关于R - 基于匹配(模板)重新排序列,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/41796850/

相关文章:

r - 集成错误 : maximum number of subdivisions reached

r - 如何在 R 中导出/导入向量?

string - 连接 2 个列表以获取 (a,d,b,e,c,f) 而不是 (a,b,c,d,e,f)

algorithm - 改进的 Stooge 排序算法的运行时

c# - 根据字符串位置对字符串数组进行排序 (C#)

algorithm - 试图提高数组中此搜索的效率

r - 检查数据框本身是否为 NA

r - 使用两个不同的状态值子集重复值

python - 在两个列表之间有效地找到 "duplicates",其中字典元素只比较字典字段的一个子集

java - 积分系统存储