r - 使用 colClasses 时 read.table 出错

我会读取一个文本文件(使用read.table)，其中包含三列中的一列，其中包含“000000”等字符，但我得到的是0。我尝试:

X<-read.table(ouvrefic, header=TRUE, row.names=1, sep="",colClasses=c("integer","character","factor"))

我得到:

Error in scan(file, what, nmax, sep, dec, quote, skip, nlines, na.strings,  : 
scan() expected 'an integer', got '"1"' (problem comes from row.names, I suppose)

我怎样才能做到这一点？

非常感谢。

我的文本文件的开头:

"" "dates" "Atscan2" "pqrPQR"
"1" "18369" "0000000000000" "1110"
"2" "18369" "0000000000000" "1220,0"
"3" "18369" "0000000000000" "2220"
"4" "18369" "0000000000000" "1230,0,0"
"5" "18369" "0000000000000" "1330,0"
"6" "18369" "0000000000000" "2330,0"
"7" "18369" "0000000000000" "3330"

最佳答案

问题出在 colClasses 参数中:

首先，即使您将使用第一列作为 row.names，您也有 4 列。因此，该向量中需要四个元素。

其次，如果您需要所有零才能正确显示，则需要将该列作为字符。

以下作品:

df <- read.table(header=T, text='"" "dates" "Atscan2" "pqrPQR"
"1" "18369" "0000000000000" "1110"
"2" "18369" "0000000000000" "1220,0"
"3" "18369" "0000000000000" "2220"
"4" "18369" "0000000000000" "1230,0,0"
"5" "18369" "0000000000000" "1330,0"
"6" "18369" "0000000000000" "2330,0"
"7" "18369" "0000000000000" "3330"', 
row.names=1, 
colClasses=c('character', 'character',"character","factor"))

输出:

> df
  dates       Atscan2   pqrPQR
1 18369 0000000000000     1110
2 18369 0000000000000   1220,0
3 18369 0000000000000     2220
4 18369 0000000000000 1230,0,0
5 18369 0000000000000   1330,0
6 18369 0000000000000   2330,0
7 18369 0000000000000     3330

如您在上面看到的问题是，如果列的元素被引用(如日期列)，那么在 colClasses 中使用 integer 选项将不起作用(并且因此我也将其转换为字符)。之后您可以随时使用 as.integer 并将其转换为整数。

Akrun 在注释中提供了直接的解决方案，该解决方案将首先删除从 readLines 读取的双引号，然后在列上应用 colClasses:

 df <- read.table(text=gsub('[\\"]', '', readLines('ouvrefic.txt')),
                  row.names=1, 
                  colClasses=c('character', 'integer', 'character', 'factor'))

关于r - 使用 colClasses 时 read.table 出错，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/28900927/

r - 使用 colClasses 时 read.table 出错

上一篇：WPF : TextBox binding not working after using DataTemplateSelector/ContentTemplateSelector

下一篇：maven - Spring Boot 忽略主类