r - 如何查询数据框并根据 R 中的另一列更改数据?

标签 r dplyr tidyverse

我有如下所示的简单时间跟踪数据:

df = tribble(~Date, ~Name,  ~Team,  ~Status,    ~Hours_Type,    ~Hours, ~Standard,  ~Deficit,   ~Overtime,  ~Leave,
"April 16 2021",    "Jeff", "Coastal",  "FT",   "Billable", 40, 40, 0,  0,  0,
"April 23 2021",    "Jeff", "Coastal",  "FT",   "Billable", 40, 40, 0,  0,  0,
"April 16 2021",    "Jeff", "Coastal",  "FT",   "Leave",    0,  0,  0,  0, 0,
"April 23 2021",    "Jeff", "Coastal",  "FT",   "Leave",    0,  0,  0,  0, 0,
"April 16 2021",    "Megan",    "Coastal",  "FT",   "Billable", 40, 40, 0,  0,  0,
"April 23 2021",    "Megan",    "Coastal",  "FT",   "Billable", 40, 40, 0,  0,  0,
"April 16 2021",    "Megan",    "Coastal",  "FT",   "Leave",    0, 0,   0,  0, 0,
"April 23 2021",    "Megan",    "Coastal",  "FT",   "Leave",    0, 0,   0,  0, 0,
"April 16 2021",    "Minden",   "Coastal",  "FT",   "Billable", 16, 16, 24, 0,  0,
"April 23 2021",    "Minden",   "Coastal",  "FT",   "Billable", 28, 28, 12, 0,  0,
"April 16 2021",    "Minden",   "Coastal",  "FT",   "Leave",    24, 0,  0,  0, 24,
"April 23 2021",    "Minden",   "Coastal",  "FT",   "Leave",    0,  0,  0,  0, 0)

# A tibble: 12 x 10
   Date          Name   Team    Status Hours_Type Hours Standard Deficit Overtime Leave
   <chr>         <chr>  <chr>   <chr>  <chr>      <dbl>    <dbl>   <dbl>    <dbl> <dbl>
 1 April 16 2021 Jeff   Coastal FT     Billable      40       40       0        0     0
 2 April 23 2021 Jeff   Coastal FT     Billable      40       40       0        0     0
 3 April 16 2021 Jeff   Coastal FT     Leave          0        0       0        0     0
 4 April 23 2021 Jeff   Coastal FT     Leave          0        0       0        0     0
 5 April 16 2021 Megan  Coastal FT     Billable      40       40       0        0     0
 6 April 23 2021 Megan  Coastal FT     Billable      40       40       0        0     0
 7 April 16 2021 Megan  Coastal FT     Leave          0        0       0        0     0
 8 April 23 2021 Megan  Coastal FT     Leave          0        0       0        0     0
 9 April 16 2021 Minden Coastal FT     Billable      16       16      24        0     0
10 April 23 2021 Minden Coastal FT     Billable      28       28      12        0     0
11 April 16 2021 Minden Coastal FT     Leave         24        0       0        0    24
12 April 23 2021 Minden Coastal FT     Leave          0        0       0        0     0

如何查询 Leave 列并检查同一 Date 中的 Deficit 列是否实际上应该为 0,因为它实际上不是赤字,因为它是由同一天的休假弥补的?

例如,我如何让 R 检查 DeficitLeaveNameDate 列为了修改此表并将 Minden 的 4 月 16 日 24 小时 赤字 更改为 0(第 9 行),因为他在 4 月 16 日的 24 小时休假中涵盖了它(第 11 行)?

enter image description here

这将是预期的结果,相关代码可以在整个数据集中泛化:

# A tibble: 12 x 10
   Date          Name   Team    Status Hours_Type Hours Standard Deficit Overtime Leave
   <chr>         <chr>  <chr>   <chr>  <chr>      <dbl>    <dbl>   <dbl>    <dbl> <dbl>
 1 April 16 2021 Jeff   Coastal FT     Billable      40       40       0        0     0
 2 April 23 2021 Jeff   Coastal FT     Billable      40       40       0        0     0
 3 April 16 2021 Jeff   Coastal FT     Leave          0        0       0        0     0
 4 April 23 2021 Jeff   Coastal FT     Leave          0        0       0        0     0
 5 April 16 2021 Megan  Coastal FT     Billable      40       40       0        0     0
 6 April 23 2021 Megan  Coastal FT     Billable      40       40       0        0     0
 7 April 16 2021 Megan  Coastal FT     Leave          0        0       0        0     0
 8 April 23 2021 Megan  Coastal FT     Leave          0        0       0        0     0
 9 April 16 2021 Minden Coastal FT     Billable      16       16       0        0     0
10 April 23 2021 Minden Coastal FT     Billable      28       28      12        0     0
11 April 16 2021 Minden Coastal FT     Leave         24        0       0        0    24
12 April 23 2021 Minden Coastal FT     Leave          0        0       0        0     0

注意:我必须保留 Leave 列,因为我在堆叠条形图中使用它来可视化此数据 - 请参阅本例中的 24 Deficit对于 Minden 实际上应该离开,但我不知道如何自动进行此更改,只能手动进行: enter image description here

最佳答案

我认为此策略最有效(尽管您的示例不包括其他可能的场景)

df %>% group_by(Date, Name) %>%
  mutate(Deficit = ifelse(Hours_Type == "Billable", Deficit - Leave[Hours_Type == "Leave"], Deficit))

# A tibble: 12 x 10
# Groups:   Date, Name [6]
   Date          Name   Team    Status Hours_Type Hours Standard Deficit Overtime Leave
   <chr>         <chr>  <chr>   <chr>  <chr>      <dbl>    <dbl>   <dbl>    <dbl> <dbl>
 1 April 16 2021 Jeff   Coastal FT     Billable      40       40       0        0     0
 2 April 23 2021 Jeff   Coastal FT     Billable      40       40       0        0     0
 3 April 16 2021 Jeff   Coastal FT     Leave          0        0       0        0     0
 4 April 23 2021 Jeff   Coastal FT     Leave          0        0       0        0     0
 5 April 16 2021 Megan  Coastal FT     Billable      40       40       0        0     0
 6 April 23 2021 Megan  Coastal FT     Billable      40       40       0        0     0
 7 April 16 2021 Megan  Coastal FT     Leave          0        0       0        0     0
 8 April 23 2021 Megan  Coastal FT     Leave          0        0       0        0     0
 9 April 16 2021 Minden Coastal FT     Billable      16       16       0        0     0
10 April 23 2021 Minden Coastal FT     Billable      28       28      12        0     0
11 April 16 2021 Minden Coastal FT     Leave         24        0       0        0    24
12 April 23 2021 Minden Coastal FT     Leave          0        0       0        0     0

让我们换一个场景,Jeff 在 16 号缺勤 12 个小时,请假 6 个小时。梅根在 23 日有 15 个小时的缺勤,没有任何休假。 df 在这种情况下将是

# A tibble: 12 x 10
   Date          Name   Team    Status Hours_Type Hours Standard Deficit Overtime Leave
   <chr>         <chr>  <chr>   <chr>  <chr>      <dbl>    <dbl>   <dbl>    <dbl> <dbl>
 1 April 16 2021 Jeff   Coastal FT     Billable      40       40      12        0     0
 2 April 23 2021 Jeff   Coastal FT     Billable      40       40       0        0     0
 3 April 16 2021 Jeff   Coastal FT     Leave          0        0       0        0     6
 4 April 23 2021 Jeff   Coastal FT     Leave          0        0       0        0     0
 5 April 16 2021 Megan  Coastal FT     Billable      40       40       0        0     0
 6 April 23 2021 Megan  Coastal FT     Billable      40       40      15        0     0
 7 April 16 2021 Megan  Coastal FT     Leave          0        0       0        0     0
 8 April 23 2021 Megan  Coastal FT     Leave          0        0       0        0     0
 9 April 16 2021 Minden Coastal FT     Billable      16       16      24        0     0
10 April 23 2021 Minden Coastal FT     Billable      28       28      12        0     0
11 April 16 2021 Minden Coastal FT     Leave         24        0       0        0    24
12 April 23 2021 Minden Coastal FT     Leave          0        0       0        0     0

并输出

# A tibble: 12 x 10
# Groups:   Date, Name [6]
   Date          Name   Team    Status Hours_Type Hours Standard Deficit Overtime Leave
   <chr>         <chr>  <chr>   <chr>  <chr>      <dbl>    <dbl>   <dbl>    <dbl> <dbl>
 1 April 16 2021 Jeff   Coastal FT     Billable      40       40       6        0     0
 2 April 23 2021 Jeff   Coastal FT     Billable      40       40       0        0     0
 3 April 16 2021 Jeff   Coastal FT     Leave          0        0       0        0     6
 4 April 23 2021 Jeff   Coastal FT     Leave          0        0       0        0     0
 5 April 16 2021 Megan  Coastal FT     Billable      40       40       0        0     0
 6 April 23 2021 Megan  Coastal FT     Billable      40       40      15        0     0
 7 April 16 2021 Megan  Coastal FT     Leave          0        0       0        0     0
 8 April 23 2021 Megan  Coastal FT     Leave          0        0       0        0     0
 9 April 16 2021 Minden Coastal FT     Billable      16       16       0        0     0
10 April 23 2021 Minden Coastal FT     Billable      28       28      12        0     0
11 April 16 2021 Minden Coastal FT     Leave         24        0       0        0    24
12 April 23 2021 Minden Coastal FT     Leave          0        0       0        0     0

它应该符合您的期望和提供的逻辑。修改后的场景将是(仅计费时间)

enter image description here

关于r - 如何查询数据框并根据 R 中的另一列更改数据?,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/67119391/

相关文章:

r - R中由带有dplyr的另一列分组的分类值的计数

r - 在dplyr 0.7.0+中正确使用dplyr::select,使用字符向量选择列

r - 使用 tidyverse 从选择性 "Per Day"数字创建 "Per Month"行

r - R错误: I get this error message (in R) when trying to use a for loop with a if statement, missing value where TRUE/FALSE needed [duplicate]

html - 使用下拉菜单中的选项从结果页面下载 CSV 文件

r - 在外部查找表的日期之间进行变异

r - 从具有三列的 data.frame 创建相异矩阵

r - 线性模型的nest()后跟map()

r - 从 PDF 文件中抓取表格

r - 基于变量的百分比变化