r - 使用 write.csv (base R) 保留数字作为字符字段

标签 r csv types

我有包含数字(如'0123'、'1234'等)的字符列的data.frames。当我将它们写入 csv 并读取它们时,它们最终会变成数字列。 write.csvread.csv功能有quote参数,默认情况下应该在输出中引用字符串并在输入时尊重它们,因此这种行为是意外的。

如何避免这种情况,而无需手动指定 colClasses当我读回文件时?

可重现的例子:

# dummy data
fake_data <- 
  data.frame(num=1:25, char=letters[1:25], charnum=as.character(1:25),
             stringsAsFactors=F)

# check out col classes - all good
sapply(fake_data, class)

#       num        char     charnum 
# "integer" "character" "character" 

# write it to a file and read it back
fpath <- '~/Desktop/fake_data.csv'
write.csv(fake_data, fpath, row.names=F)
fake_data2 <- read.csv(fpath, stringsAsFactors=F)

# but now look, different classes!
sapply(fake_data2, class)

#       num        char     charnum 
# "integer" "character"   "integer"

似乎错误在读取端,因为文件是用引号写入的。
> cat(readLines(fpath))
"num","char","charnum" 1,"a","1" 2,"b","2" 3,"c","3" 4,"d","4" 5,"e","5" 6,"f","6" 7,"g","7" 8,"h","8" 9,"i","9" 10,"j","10" 11,"k","11" 12,"l","12" 13,"m","13" 14,"n","14" 15,"o","15" 16,"p","16" 17,"q","17" 18,"r","18" 19,"s","19" 20,"t","20" 21,"u","21" 22,"v","22" 23,"w","23" 24,"x","24" 25,"y","25"

session 信息:

R 版本 3.1.1 (2014-07-10) |平台:x86_64-apple-darwin13.1.0(64 位)

最佳答案

感谢您的回答。进一步看这个,我有以下几点要补充。

选项 1:只使用 data.table::fread -- 像我想的那样工作

选项 2:执行此操作以构造 colClasses 字符串

 # read header and first data line
 first_data_line <- strsplit(readLines(fpath, n=2L)[2], ',')[[1]]

 # find which fields have double quotes
 char_fields <- grep('"', first_data_line)

 # construct colClasses vec
 cc <- rep(NA, length(first_data_line))
 cc[char_fields] <- 'character'

反正我是 data.table 的粉丝,#1 可能就是我要做的。

关于r - 使用 write.csv (base R) 保留数字作为字符字段,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/27432339/

相关文章:

python - 使用 'for' 或 'if' 条件在 Python 中创建 CSV 文件

php - 从 csv 中删除完整的列和数据

在 Rstudio 中重新启动 R session 但继续运行脚本

r - 如何解决在 R 中运行 CoxPH 分析时置信区间过大的问题?

r - R中的多项式拟合和绘制回归线

r - 计算一列最后一百行的平均值

C CSV GLOB排序优化

c# - 如何检查类 "is"是否是 C# 中作为变量给出的类型的实例?

C: 是否使用转换为 "outer"结构类型的指针访问嵌套结构的初始成员?

枚举声明中的 C++ 变量声明