所以我有一个看起来像这样的大数据集:
V1 V2 V3 V4
1 Sleep Domestic Eat Child Care
2 Sleep Domestic Eat Paid
3 Sleep Domestic Eat Child Care
4 Sleep Eat Paid <NA>
我想做的是
reorder
基于"template"的列["Sleep", "Eat", "Domestic", "Paid", "Child care"]
得到(输出)
V1 V2 V3 V4 V5
Sleep Eat Domestic NA Child Care
Sleep Eat Domestic Paid NA
Sleep Eat Domestic NA Child Care
Sleep Eat NA Paid NA
所以在第 1 列
Sleep
,第 2 列 Eat
, ...我不知道从哪里开始。
任何的想法 ?
数据
x = structure(list(V1 = c("Sleep", "Sleep", "Sleep", "Sleep"), V2 = c("Domestic",
"Domestic", "Domestic", "Eat"), V3 = c("Eat", "Eat", "Eat", "Paid"
), V4 = c("Child Care", "Paid", "Child Care", NA)), .Names = c("V1",
"V2", "V3", "V4"), row.names = c(NA, 4L), class = "data.frame")
template = c('Sleep', 'Eat', 'Domestic', 'Paid', 'Child care')
最佳答案
检查 rowSums
每个template
值(value),然后再次拼凑起来:
template <- c("Sleep", "Eat", "Domestic", "Paid", "Child Care")
# i've fixed this template so the case matches the values for 'Child Care'
data.frame(lapply(
setNames(template, seq_along(template)),
function(v) c(NA,v)[(rowSums(x==v,na.rm=TRUE)>0)+1]
))
# X1 X2 X3 X4 X5
#1 Sleep Eat Domestic <NA> Child Care
#2 Sleep Eat Domestic Paid <NA>
#3 Sleep Eat Domestic <NA> Child Care
#4 Sleep Eat <NA> Paid <NA>
或者使用
pmax
的替代方法:data.frame(
lapply(
setNames(template, seq_along(template)),
function(v) do.call(pmax, c(replace(x, x != v,NA),na.rm=TRUE))
)
)
关于R - 基于匹配(模板)重新排序列,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/41796850/