r - 选择某些(未知)索引之间的行

我有一个看起来像这样的 df:

structure(list(id = c("2023021112", "2023021112", "2023021112", 
"2023021112", "2023021112", "2023021112", "2023021112", "2023021112", 
"2023021112", "2023021112"), response = c("1", "Happy", "Sad", 
"Neutral", "Fearful", "2", "Disgusted", "Happy", "Sad", "Surprised"
)), row.names = c(1L, 2L, 3L, 4L, 5L, 72L, 73L, 74L, 75L, 76L
), class = "data.frame")

对于第 2 列中 1 和 2 之间的行，我想在 ID 中添加“-01”(即 2023021112-01)。对于 2 之后的行(在完整数据集中，它将介于 2 和 3 之间，最终为 n 和 n+1)，我想添加一个“-02”(即 2023021112-02)。然后我想删除第 2 列中包含这些数值的所有行。

如何表达“对于 n 到 n+1 之间的行，在 id 列中添加“-0n””？期望的结果:

structure(list(id = c("2023021112-01", "2023021112-01", "2023021112-01", 
"2023021112-01", "2023021112-02", "2023021112-02", "2023021112-02", 
"2023021112-02"), response = c("Happy", "Sad", "Neutral", "Fearful", 
"Disgusted", "Happy", "Sad", "Surprised")), row.names = c(2L, 3L, 
4L, 5L, 73L, 74L, 75L, 76L), class = "data.frame")

最佳答案

library(dplyr)

suppressWarnings(
df1 %>% 
  mutate(id = paste(id, 
                    sprintf("%02d", cumsum(!is.na(as.numeric(response)))),
                    sep = "-")) %>% 
  filter(is.na(as.numeric(response)))
)

#>              id  response
#> 1 2023021112-01     Happy
#> 2 2023021112-01       Sad
#> 3 2023021112-01   Neutral
#> 4 2023021112-01   Fearful
#> 5 2023021112-02 Disgusted
#> 6 2023021112-02     Happy
#> 7 2023021112-02       Sad
#> 8 2023021112-02 Surprised

^{创建于 2023 年 11 月 2 日 reprex v2.0.2}

数据:

structure(list(id = c("2023021112", "2023021112", "2023021112", 
                      "2023021112", "2023021112", "2023021112", 
                      "2023021112", "2023021112", "2023021112", "2023021112"), 
               response = c("1", "Happy", "Sad", "Neutral", "Fearful", "2", 
                            "Disgusted", "Happy", "Sad", "Surprised"
                      )), row.names = c(1L, 2L, 3L, 4L, 5L, 72L, 73L, 74L, 75L, 76L
                      ), class = "data.frame") -> df1

说明:

在这里，我们检查列 response 中具有数值的行(或者更确切地说，可以使用 as.numeric 函数转换为数字的值)。然后，使用 cumsum每当响应从字符串变为数字时，我们都会增加“标识符”。

让我们在单独的列中查看每个函数/步骤的结果，以更好地理解这一点:

df1 %>% 
  mutate(step1 = as.numeric(response),
         step2 = !is.na(step1),
         step3 = cumsum(step2),
         step4 = sprintf("%02d", step3))

#>            id  response step1 step2 step3 step4
#> 1  2023021112         1     1  TRUE     1    01
#> 2  2023021112     Happy    NA FALSE     1    01
#> 3  2023021112       Sad    NA FALSE     1    01
#> 4  2023021112   Neutral    NA FALSE     1    01
#> 5  2023021112   Fearful    NA FALSE     1    01
#> 72 2023021112         2     2  TRUE     2    02
#> 73 2023021112 Disgusted    NA FALSE     2    02
#> 74 2023021112     Happy    NA FALSE     2    02
#> 75 2023021112       Sad    NA FALSE     2    02
#> 76 2023021112 Surprised    NA FALSE     2    02

从这里，我们只是paste id和step4列放在一起并根据!step2过滤掉行值。

关于r - 选择某些(未知)索引之间的行，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/77412511/

r - 选择某些(未知)索引之间的行

数据:

说明:

上一篇：从另一个宏函数调用 C 宏

下一篇：python - 来自独特 numpy 元素的 Pandas 数据框