r - 将字符串列拆分为年月日

标签 r dplyr

我有一个数据集,其中的 info 列与下面的数据类似。如何将其拆分为年、月、日列?

代码:

  df = structure(list(id = c(1, 2, 3, 4, 5, 6, 7, 8), info = c("PRISM_ppt_provisional_4kmD2_20220925_bil", 
    "PRISM_ppt_provisional_4kmD2_20220926_bil", "PRISM_ppt_provisional_4kmD2_20220927_bil", 
    "PRISM_ppt_provisional_4kmD2_20220928_bil", "PRISM_ppt_provisional_4kmD2_20220929_bil", 
    "PRISM_ppt_provisional_4kmD2_20220930_bil", "PRISM_ppt_provisional_4kmD2_20220925_bil", 
    "PRISM_ppt_provisional_4kmD2_20220926_bil")), class = "data.frame", row.names = c(NA, 
    -8L))
    
desired_df = structure(list(id = c(1, 2, 3, 4, 5, 6, 7, 8), info = c("PRISM_ppt_provisional_4kmD2_20220925_bil", 
"PRISM_ppt_provisional_4kmD2_20220926_bil", "PRISM_ppt_provisional_4kmD2_20220927_bil", 
"PRISM_ppt_provisional_4kmD2_20220928_bil", "PRISM_ppt_provisional_4kmD2_20220929_bil", 
"PRISM_ppt_provisional_4kmD2_20220930_bil", "PRISM_ppt_provisional_4kmD2_20220925_bil", 
"PRISM_ppt_provisional_4kmD2_20220926_bil"), year = c(2022, 2022, 
2022, 2022, 2022, 2022, 2022, 2022), month = c(9, 9, 9, 9, 9, 
9, 9, 9), day = c(25, 26, 27, 28, 29, 30, 25, 26)), class = "data.frame", row.names = c(NA, 
-8L))

    # Extract year, month and day from info column
    df = separate(df, info, into = c("year", "month", "day"), sep = ?, convert = T)

最佳答案

在这种情况下最好使用extract:

library(tidyr)
df %>% 
  extract(info, "PRISM_ppt_provisional_4kmD2_(\\d{4})(\\d{2})(\\d{2})_bil",
          into = c("year", "month", "day"), remove = F)

#   id                                     info year month day
# 1  1 PRISM_ppt_provisional_4kmD2_20220925_bil 2022    09  25
# 2  2 PRISM_ppt_provisional_4kmD2_20220926_bil 2022    09  26
# 3  3 PRISM_ppt_provisional_4kmD2_20220927_bil 2022    09  27
# 4  4 PRISM_ppt_provisional_4kmD2_20220928_bil 2022    09  28
# 5  5 PRISM_ppt_provisional_4kmD2_20220929_bil 2022    09  29
# 6  6 PRISM_ppt_provisional_4kmD2_20220930_bil 2022    09  30
# 7  7 PRISM_ppt_provisional_4kmD2_20220925_bil 2022    09  25
# 8  8 PRISM_ppt_provisional_4kmD2_20220926_bil 2022    09  26

如果您的最终目标是创建一个日期列,那么这可能会更好:

library(tidyr)
library(lubridate)
df %>% 
  extract(info, "PRISM_ppt_provisional_4kmD2_(.*)_bil",
          into = "date", remove = F) %>% 
  mutate(date = ymd(date),
         year = year(date),
         month = month(date),
         day = day(date))

关于r - 将字符串列拆分为年月日,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/74056262/

相关文章:

r - 如何在 R 代码中集成 Google 距离矩阵 API key ?

r - R 中的积分函数

Rmarkdown Beamer 演示,xcolor 的选项冲突冲突

r - 应用 group_by 和 summarise(sum) 但保留大量附加列

r - 使用 dplyr 按组计算加权平均值(并复制其他方法)

r - 合并不同长度的数据帧

r - dplyr::n() 返回 "Error: This function should not be called directly"

r - dplyr 式的方式来执行分组和未分组的汇总操作

r - 在 dplyr 0.5.0 中,在分组数据框中,为什么 slice(1) 没有给出与 filter(row_number() == 1) 相同的行排序?

r - 为什么两个ggplot对象通过all.equal()测试,但未通过same()测试?