r - 日期列中的不同条目，目的是在删除之前保留列。如何最好地清理这样的 "date"列？

structure(list(year = c("Mar-10", "2014", "May-August", 
"2009/2010", "2015", NA_character_), date = c("August 31st, 2010", "March 13th, 2015", 
"May 31st, 2010", "June 16th, 2010", "May 18th, 2010", "April 7th, 2010")), row.names = c(NA, 
-6L), class = c("tbl_df", "tbl", "data.frame"))

# # A tibble: 6 × 2
#   year       date             
#   <chr>      <chr>            
# 1 Mar-10     August 31st, 2010
# 2 2014       March 13th, 2015 
# 3 May-August May 31st, 2010   
# 4 2009/2010  June 16th, 2010  
# 5 2015       May 18th, 2010   
# 6 NA         April 7th, 2010

我的目标是在开始删除与第 1 列相关的错误条目之前保留尽可能多的列，希望通过将条目简化为简单的年份值(如本示例集的第 2 行中所示)来实现。

对于 NA 值，我不想删除，而是粘贴下一列中的数据。

预期输出:

# # A tibble: 6 × 2
#   year  date             
#   <chr> <chr>            
# 1 2010  August 31st, 2010
# 2 2014  March 13th, 2015 
# 3 2010  May 31st, 2010   
# 4 2010  June 16th, 2010  
# 5 2015  May 18th, 2010   
# 6 2010  April 7th, 2010

用简单的英语来说，如果该字段包含可接受的值，例如“2014”，请保持原样。如果它包含仍然确认年份的值，例如“Mar-10”，请使用 2010。如果年份无法确定，例如“May-August”、“2009/2010”或 NA 值，请使用日期列中的年份。

最佳答案

您可以使用coalesce + str_extract:

library(dplyr)
library(stringr)

df %>%
  mutate(year = coalesce(str_extract(year, "^\\d{4}$"), str_extract(date, "\\d{4}")))

# # A tibble: 6 × 2
#   year  date             
#   <chr> <chr>            
# 1 2010  August 31st, 2010
# 2 2014  March 13th, 2015 
# 3 2010  May 31st, 2010   
# 4 2010  June 16th, 2010  
# 5 2015  May 18th, 2010   
# 6 2010  April 7th, 2010

关于r - 日期列中的不同条目，目的是在删除之前保留列。如何最好地清理这样的 "date"列？，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/75566502/

r - 日期列中的不同条目，目的是在删除之前保留列。如何最好地清理这样的 "date"列？

预期输出:

上一篇：python - 如何解决 python unittest 的事件循环已关闭的错误？

下一篇：erlang - erlang中的erl终端如何完整显示