r - 使用 read.csv.ffdf() 会引发错误

我正在尝试使用 ff 包将大型(370 万行，180 列)数据集读入 R。数据集中有多种数据类型 - 因子、逻辑和数值。

问题出在读取数值变量时。例如，我的专栏之一是:

TotalBeforeTax
126.9
88.0
124.5
90.9
...

当我尝试读取数据时，抛出以下错误:

Error in scan(file, what, nmax, sep, dec, quote, skip, nlines, na.strings,  : 
  scan() expected 'a real', got '"126.90000"'

我尝试使用 colClasses 参数将类声明为integer(它已声明为numeric)，但无济于事。我还尝试将其更改为真实的(无论这意味着什么)，它开始读取数据，但在某些时候会抛出:

Error in methods::as(data[[i]], colClasses[i]) : 
  no method or default for coercing “character” to “a real”

(我的猜测是，因为它遇到了 NA 并且不知道如何处理它。)

有趣的是，如果我将该列声明为一个因素，所有内容都会很好地读取。

什么给出了？

最佳答案

好的，所以我设法使用原始的解决方法解决了这个问题。首先，使用 csv 文件分割器应用程序分割 .csv 文件。然后，执行以下代码:

## First, set the folder where the split .csv files are. Set the file names.

sourceDir <- "split_files_folder"
sourceFile <- paste(sourceDir,"common_name_of_split_files", sep = "/")

## Now set the number of split pieces.

pieces <- "some_number"

## Set the destination folder for the tab-delimited text files. 
## Set the output file name.

destDir <- "destination_folder"
destFile <- paste(paste(destDir, "datafile", sep = "/"), "txt", sep = ".")

## Now, initialize the loop.

for (i in 1:pieces)
{
  temp <- read.csv(file = paste(paste(sourceFile, i, sep = "_"), "csv", sep = "."))
  if (i == 1) 
  {
    write.table(temp, file = destFile, quote = FALSE, sep = "\t", row.names = FALSE, col.names = TRUE)
  }
  else 
  {
    write.table(temp, file = destFile, append = TRUE, quote = FALSE, sep = "\t", row.names = FALSE, col.names = FALSE)
  }
}

瞧!您已经获得了一个巨大的制表符分隔文本文件!

关于r - 使用 read.csv.ffdf() 会引发错误，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/22859048/

r - 使用 read.csv.ffdf() 会引发错误

上一篇：php - Google 的 oauth2-client 在 Codeigniter 中工作

下一篇：PHP/MySQL - 检查双人 session 室预订