我有一个很大的数据集,其中包含数字、字符和一些日期变量。
structure(list(DOB = structure(c(18155, 18164,
18785, 18328, 18314, 18307, 18324), class = "Date"), date_today_ppt_SEEQ = structure(c(18155,
18164, 18785, 18328, 18314, 18307, 18324), class = "Date"), switching_home = c("Sometimes",
"Most of the time", "Sometimes", "Sometimes", "Rarely", "Sometimes",
"Rarely"), single_lang_environm_home = c(80, 0, 100, 75, 95,
70, 30), dual_lang_environm_home = c(20, 60, 0, 23, 0, 20, 70
), dense_code_sw_home = c(0, 40, 0, 2, 5, 10, 0), between_sentence_sw_home = c("Sometimes",
"Most of the time", "Sometimes", "Sometimes", "Never", "Sometimes",
"Rarely"), within_sentence_sw_home = c("Sometimes", "Most of the time",
"Most of the time", "Rarely", "Rarely", "Rarely", "Sometimes"
)), row.names = c(NA, 7L), class = "data.frame")
我正在尝试使用以下方法将字符值重新编码为数字:
exampledata[exampledata == "Always"] <- 100
exampledata[exampledata == "Frequently"] <- 75
exampledata[exampledata == "Most of the time"] <- 75
exampledata[exampledata == "Sometimes"] <- 50
exampledata[exampledata == "Rarely"] <- 25
exampledata[exampledata == "Never"] <- 0
当我尝试这样做时,出现错误:
Error in charToDate(x) :
character string is not in a standard unambiguous format
我怀疑这与我的数据集中有日期格式(来自 xlsx 文件)有关,所以我做了一些事情,因为我读到它可能是区域设置或格式的问题日期。
exampledata$DOB <- openxlsx::convertToDate(exampledata$DOB)
exampledata$DOB <- as.Date(exampledata$DOB, format = "%d/%m/%y")#recorde as DD/MM/YYYY
exampledata$DOB <- lubridate::ymd(exampledata$DOB, locale = "English")
有人建议使用mutate
,所以我也尝试了:
exampledata <- mutate(exampledata, DOB = as.Date(DOB, "%d/%m/%y"))
当我运行时:
> class(exampledata$DOB)
[1] "Date"
它清楚地显示为日期。但是,当我在窗口中打开数据框以进行可视化探索并将光标指向变量时,“第 1 列:未知”出现在我的光标下,这让我认为它没有转换为预期的(?)日期格式.
我通读了人们的类似问题,但不确定为什么当我运行 class
时它显示为日期并且仍然产生问题。另外,我应该只处理字符变量中的值,所以不确定为什么它会产生问题。另外,人们似乎在谈论它,但我没有找到这些标准的明确日期值实际上是什么?
最后,当我使用 dput 创建可重现的示例时,我可以看到我的日期已转换为数字,但是当我打印该列时,它会打印日期,所以我真的很困惑:
exampledata$DOB
[1] "2019-09-16" "2019-09-25" "2021-06-07" "2020-03-07" "2020-02-22" "2020-02-15" "2020-03-03"
如果有人有想法,我很乐意在这里提供帮助。
最后,这是我的版本信息(操作系统是Windows):
> R.version.string
[1] "R version 4.0.3 (2020-10-10)"
最佳答案
这是一个带有 mutate across 的 dplyr 方法:
library(dplyr)
df %>%
mutate(across(c(switching_home, between_sentence_sw_home, within_sentence_sw_home), ~case_when(. == "Always" ~ 100 ,
. == "Frequently" ~ 75,
. == "Most of the time" ~ 75,
. == "Sometimes" ~ 50,
. == "Rarely" ~ 25,
. == "Never" ~ 0,
TRUE ~ NA_real_))
)
DOB date_today_ppt_SEEQ switching_home single_lang_environm_home dual_lang_environm_home dense_code_sw_home between_sentence_sw_home
1 2019-09-16 2019-09-16 50 80 20 0 50
2 2019-09-25 2019-09-25 75 0 60 40 75
3 2021-06-07 2021-06-07 50 100 0 0 50
4 2020-03-07 2020-03-07 50 75 23 2 50
5 2020-02-22 2020-02-22 25 95 0 5 0
6 2020-02-15 2020-02-15 50 70 20 10 50
7 2020-03-03 2020-03-03 25 30 70 0 25
within_sentence_sw_home
1 50
2 75
3 75
4 25
5 25
6 25
7 50
关于R:charToDate(x) 中的错误:字符串不是标准的明确格式,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/73804841/