我有一个如下所示的数据框
Hair Eye Freq
1 Black Brown 32
2 Brown Brown 53
3 Red Brown 10
4 Blond Brown 3
5 Red Blue 10
6 Blond Blue 30
7 Black Hazel 10
8 Blond Hazel 5
上面的数据帧中4种头发颜色的频率Black, Brown, Red and Blond
不同眼睛颜色都有记录Brown, Blue and Hazel
。但是,我想填写相应眼睛颜色缺失的头发颜色频率,以便生成如下数据框。如有任何帮助,我们将不胜感激。
Hair Eye Freq
1 Black Brown 32
2 Brown Brown 53
3 Red Brown 10
4 Blond Brown 3
5 Black Blue 0
6 Brown Blue 0
7 Red Blue 10
8 Blond Blue 30
9 Black Hazel 10
10 Brown Hazel 0
11 Red Hazel 0
12 Blond Hazel 5
最佳答案
使用 expand.grid
创建一个包含头发和眼睛颜色组合的新表格。然后使用连接方法将df1
的频率绑定(bind)到df2
。最后删除 NA。
library('data.table')
hair <- c('Black', 'Brown', 'Red', 'Blond') # hair colors
eye <- c('Brown', 'Blue', 'Hazel') # eye colors
df2 <- expand.grid(Hair = hair, Eye = eye) # data frame with combinations of eye and hair colors
setDT(df2)[df1, `:=` (Freq = i.Freq), on = .(Hair, Eye)] # join df2 with df1 based `on = .(Hair, Eye)` and bind `Freq` from df1 to df2
df2[is.na(Freq), Freq := 0 ] # remove NA with 0
输出:
df2
# Hair Eye Freq
# 1: Black Brown 32
# 2: Brown Brown 53
# 3: Red Brown 10
# 4: Blond Brown 3
# 5: Black Blue 0
# 6: Brown Blue 0
# 7: Red Blue 10
# 8: Blond Blue 30
# 9: Black Hazel 10
# 10: Brown Hazel 0
# 11: Red Hazel 0
# 12: Blond Hazel 5
数据:
df1 <- fread('id Hair Eye Freq
1 Black Brown 32
2 Brown Brown 53
3 Red Brown 10
4 Blond Brown 3
5 Red Blue 10
6 Blond Blue 30
7 Black Hazel 10
8 Blond Hazel 5')
df1[, id:=NULL]
关于python - 填充数据框中缺失的行,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/42944351/