有谁知道为什么争论colClasses
似乎在 read.xlsx
中不起作用?
我创建了一个示例 *.xlsx 文件:
> library(xlsx)
> d1 = data.frame(A=LETTERS[1:3], B=letters[1:3], C=1:3, D=c(1.1, NA, NA))
> str(d1)
'data.frame': 3 obs. of 4 variables:
$ A: Factor w/ 3 levels "A","B","C": 1 2 3
$ B: Factor w/ 3 levels "a","b","c": 1 2 3
$ C: int 1 2 3
$ D: num 1.1 NA NA
> write.xlsx(d1, 'test.xlsx', sheetName='Sheet1', row.names=F, showNA=F)
然后尝试用
read.xlsx
阅读它, 不带和带 colClasses
争论:> d2 = read.xlsx('test.xlsx', sheetName='Sheet1')
> str(d2)
'data.frame': 3 obs. of 4 variables:
$ A: Factor w/ 3 levels "A","B","C": 1 2 3
$ B: Factor w/ 3 levels "a","b","c": 1 2 3
$ C: num 1 2 3
$ D: num 1.1 NA NA
> d2 = read.xlsx('test.xlsx', sheetName='Sheet1', colClasses=c(B='character', 'A'='character'))
> str(d2)
'data.frame': 3 obs. of 4 variables:
$ A: Factor w/ 3 levels "A","B","C": 1 2 3
$ B: Factor w/ 3 levels "a","b","c": 1 2 3
$ C: num 1 2 3
$ D: num 1.1 NA NA
问题是
colClasses
好像没有效果。有任何想法吗?感谢您的帮助。
阿列克谢
附言我有 R 3.0.1,
xlsx
0.5.1
最佳答案
colClasses=
正在工作,但问题是在您的系统上,导入数据时的默认操作是将字符列转换为因子。
如果您导入 test.xlsx
并将所有列设置为 "character"
,你会看到所有的列都是作为因子(也是数字)。
d2 = read.xlsx('test.xlsx', sheetName='Sheet1', colClasses=rep("character",4))
str(d2)
'data.frame': 3 obs. of 4 variables:
$ A: Factor w/ 3 levels "A","B","C": 1 2 3
$ B: Factor w/ 3 levels "a","b","c": 1 2 3
$ C: Factor w/ 3 levels "1","2","3": 1 2 3
$ D: Factor w/ 1 level "1.1": 1 NA NA
为确保字符不会转换为因子,您可以添加参数
stringsAsFactors=FALSE
功能 read.xlsx()
.d2 = read.xlsx('test.xlsx', sheetName='Sheet1',
colClasses=c(B='character', A='character'),stringsAsFactors=FALSE)
str(d2)
'data.frame': 3 obs. of 4 variables:
$ A: chr "A" "B" "C"
$ B: chr "a" "b" "c"
$ C: num 1 2 3
$ D: num 1.1 NA NA
关于read.xlsx 和 colClasses,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/18279268/