r - 如何使用 spread() 获得所需的输出

标签 r dplyr tidyr tidyverse spread

假设,我有一个如下所示的数据框
df1:

+------+--+------+--------+
| ID   |  | Type | Points |
+------+--+------+--------+
| DJ45 |  | A    | 69.2 |
| DJ45 |  | F    | 60.8 |
| DJ45 |  | C    |  2.9 |
| DJ46 |  | B    | 22.7 |
| DJ46 |  | D    | 18.7 |
| DJ46 |  | A    | 16.1 |
| DJ47 |  | E    | 67.2 |
| DJ47 |  | C    | 63.1 |
| DJ47 |  | F    | 16.7 |
| DJ48 |  | D    |  8.4 |
+------+--+------+------+

我想要获得一个结果,该结果将以以下格式提供类型的前 2 个值(逐点):

输出:

+------+---------+---------+
| ID   | Type1   | Type2   |
+------+---------+---------+
| DJ45 |   A     | F       | 
| DJ46 |   B     | D       | 
| DJ47 |   E     | C       | 
| DJ48 |   D     | NA      | 

我用过:

df1 %>%
  group_by(Id) %>%
  top_n(2,wt=Points) %>%
  mutate(val = paste("Type", row_number())) %>% 
  filter(row_number()<=2) %>% 
  select(-Points) %>% 
  spread(val, Type)

但我得到以下答案:

输出:

+------+------+--------+---------+
| ID   |Points|Type1   | Type2   |
+------+------+--------+---------+
| DJ45 | 69.2 |  A     | NA      | 
| DJ45 | 60.8 |  NA    | F       | 
| DJ46 | 22.7 |  B     | NA      | 
| DJ46 | 18.7 |  NA    | D       | 
| DJ47 | 67.2 |  E     | NA      | 
| DJ47 | 63.1 |  NA    | C       |
| DJ48 |  8.4 |  D     | NA      |

最佳答案

df <- read.table(header = T, stringsAsFactors = F, text = "
ID Type Points
DJ45 A 69.2
DJ45 F 60.8
DJ45 C 2.9
DJ46 B 22.7
DJ46 D 18.7
DJ46 A 16.1
DJ47 E 67.2
DJ47 C 63.1
DJ47 F 16.7
DJ48 D 8.4
")

library(dplyr)
library(tidyr)

df %>%
  group_by(ID) %>%
  top_n(2, wt = Points) %>%
  arrange(-Points) %>% 
  mutate(Points = paste0('Type', row_number())) %>% 
  spread(Points, Type)
  • top_n(2, wt = Points) 根据 ID 组内的 Points 过滤前两行
  • arrange(-Points) 按降序排列它们
  • mutate(Points = Paste0('Type', row_number()))Points 修改为等于 'Type' + ID 组内的行号 ( 1 至 2)
  • spread(Points, Type)Points 中的每个唯一值创建列,并将适当的 Type 值放入其中

关于r - 如何使用 spread() 获得所需的输出,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/43995218/

相关文章:

r - dplyr:如何按子组标准过滤组

r - 如何根据另一个类别对因子水平进行排序?

r - R中避免嵌套ifelse语句的方法

r - 如何识别跨列的第一次和最后一次出现并记录位置?

r - dplyr bind_rows 改变类型以匹配

r - "Multi-step"在 R 中使用 broom 和 dplyr 进行回归

r - 计算行之间的日期差异

r - dplyr/R 带复位的累积总和

R传播错误: Duplicate identifiers for rows

当存在多个重复列时,删除表中的重复列