我有一个看起来像这样的 df:
structure(list(id = c("2023021112", "2023021112", "2023021112",
"2023021112", "2023021112", "2023021112", "2023021112", "2023021112",
"2023021112", "2023021112"), response = c("1", "Happy", "Sad",
"Neutral", "Fearful", "2", "Disgusted", "Happy", "Sad", "Surprised"
)), row.names = c(1L, 2L, 3L, 4L, 5L, 72L, 73L, 74L, 75L, 76L
), class = "data.frame")
对于第 2 列中 1 和 2 之间的行,我想在 ID 中添加“-01”(即 2023021112-01)。对于 2 之后的行(在完整数据集中,它将介于 2 和 3 之间,最终为 n 和 n+1),我想添加一个“-02”(即 2023021112-02)。然后我想删除第 2 列中包含这些数值的所有行。
如何表达“对于 n 到 n+1 之间的行,在 id 列中添加“-0n””?期望的结果:
structure(list(id = c("2023021112-01", "2023021112-01", "2023021112-01",
"2023021112-01", "2023021112-02", "2023021112-02", "2023021112-02",
"2023021112-02"), response = c("Happy", "Sad", "Neutral", "Fearful",
"Disgusted", "Happy", "Sad", "Surprised")), row.names = c(2L, 3L,
4L, 5L, 73L, 74L, 75L, 76L), class = "data.frame")
最佳答案
library(dplyr)
suppressWarnings(
df1 %>%
mutate(id = paste(id,
sprintf("%02d", cumsum(!is.na(as.numeric(response)))),
sep = "-")) %>%
filter(is.na(as.numeric(response)))
)
#> id response
#> 1 2023021112-01 Happy
#> 2 2023021112-01 Sad
#> 3 2023021112-01 Neutral
#> 4 2023021112-01 Fearful
#> 5 2023021112-02 Disgusted
#> 6 2023021112-02 Happy
#> 7 2023021112-02 Sad
#> 8 2023021112-02 Surprised
创建于 2023 年 11 月 2 日 reprex v2.0.2
数据:
structure(list(id = c("2023021112", "2023021112", "2023021112",
"2023021112", "2023021112", "2023021112",
"2023021112", "2023021112", "2023021112", "2023021112"),
response = c("1", "Happy", "Sad", "Neutral", "Fearful", "2",
"Disgusted", "Happy", "Sad", "Surprised"
)), row.names = c(1L, 2L, 3L, 4L, 5L, 72L, 73L, 74L, 75L, 76L
), class = "data.frame") -> df1
说明:
在这里,我们检查列 response
中具有数值的行(或者更确切地说,可以使用 as.numeric
函数转换为数字的值)。然后,使用 cumsum
每当响应从字符串变为数字时,我们都会增加“标识符”。
让我们在单独的列中查看每个函数/步骤的结果,以更好地理解这一点:
df1 %>%
mutate(step1 = as.numeric(response),
step2 = !is.na(step1),
step3 = cumsum(step2),
step4 = sprintf("%02d", step3))
#> id response step1 step2 step3 step4
#> 1 2023021112 1 1 TRUE 1 01
#> 2 2023021112 Happy NA FALSE 1 01
#> 3 2023021112 Sad NA FALSE 1 01
#> 4 2023021112 Neutral NA FALSE 1 01
#> 5 2023021112 Fearful NA FALSE 1 01
#> 72 2023021112 2 2 TRUE 2 02
#> 73 2023021112 Disgusted NA FALSE 2 02
#> 74 2023021112 Happy NA FALSE 2 02
#> 75 2023021112 Sad NA FALSE 2 02
#> 76 2023021112 Surprised NA FALSE 2 02
从这里,我们只是paste
id
和step4
列放在一起并根据!step2
过滤掉行值。
关于r - 选择某些(未知)索引之间的行,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/77412511/