R:将具有混合 5 位数字和字符的列转换为日期

我读入了一个 Excel 文件，其中有一列死亡日期，其中包含日期和字符串。它在 excel 中看起来不错，但是当它被读入 R 时，所有日期都被转换为 5 位数字 - 看起来像这样:

#>  date of death  
#>   <chr>           
#> 1 44673         
#> 2 44674         
#> 3 alive
#> 4 not known
#> 5 NA

有什么办法吗

我可以以一种不会转换为 5 位数字的方式读取 excel 文件吗？
如果 1 不可能，是否有办法将该列转换为日期但仅限于数字？

最佳答案

如果我们想保持字符串不变，那么我们可以在读取后将数字元素转换为Date类，然后与原始列合并。请注意，列只能有一个类型，因此我们可能需要返回 character 类型，因为有字符串

library(dplyr)
df1 <- df1 %>% 
    mutate(`date of death` = coalesce(as.character(janitor::excel_numeric_to_date(
          as.numeric(`date of death`))), `date of death`))

-输出

df1
# A tibble: 6 × 2
  `date of death`  col2
  <chr>           <dbl>
1 2022-04-22          5
2 2022-04-23          3
3 alive               9
4 not known          10
5 NA                 11

尝试使用 read.xlsx(来自 openxlsx)和 read_excel(来自 readxl)。根据创建的示例，read_excel 会在指定 col_types 时转换为 Date 类，但这也会导致 NA 用于同一列中的其他字符串值。但是，read.xlsx 和 detectDates 进行转换

library(openxlsx)
df1 <- read.xlsx(file.choose(),  detectDates = TRUE)
df1
  date.of.death col2
1 2022-04-22      5
2 2022-04-23      3
3      alive      9
4   not known    10
5        <NA>    11

> sapply(df1, class)
date.of.death          col2 
  "character"     "numeric"

要保留列名中的空格，我们可能需要指定 check.names 和 sep.names

df1 <- read.xlsx(file.choose(),  detectDates = TRUE,
      check.names = FALSE, sep.names = " ")

关于R:将具有混合 5 位数字和字符的列转换为日期，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/72663592/

R:将具有混合 5 位数字和字符的列转换为日期

上一篇：读取 excel 文件的文件夹并将单个工作表作为单独的 df 导入 R 中的名称

下一篇：excel - 将 find LastRow 代码转换为函数