通过 read.big.matrix 读取 R 中的大数据

我正在使用read.big.matrix在r中读取维度为3131875*5的数据。我的数据既有字符列又有数字列，包括日期变量。我应该使用的命令是

as1 <- read.big.matrix("C:/Documents and Settings/Arundhati.Mukherjee/My Documents/Arundhati/big data/MB07_Arundhati/sample2.txt",
                       header=TRUE, 
                       backingfile="session.bin",
                       descriptorfile="session.desc",
                       type = NA)

但是在这种情况下，R 不接受 type = NA，并且我收到错误:

Error in filebacked.big.matrix(nrow = nrow, ncol = ncol, type = type,  : 
  Problem creating filebacked matrix.
In addition: Warning messages:
1: In na.omit(as.integer(firstLineVals)) : NAs introduced by coercion
2: In na.omit(as.double(firstLineVals)) : NAs introduced by coercion
3: In read.big.matrix("C:/Documents and Settings/Arundhati.Mukherjee/My Documents/Arundhati/big data/MB07_Arundhati/sample2.txt",  :
  Because type was not specified, we chose double based on the first line of data.

我需要知道这里的类型应该是什么。我尝试使用像 double 这样的选项，但这引发了同样的错误。

请帮助我。

最佳答案

来自?read.big.matrix:

Files must contain only one atomic type (all integer, for example).

因此，您将无法读取包含字符、数字、整数、日期等组合的数据。您可以对文件进行一些处理，例如使用不同的程序将字符变量转换为整数表示(例如转换为 R 中的因子)。

编辑:

关于bigmemory website有一个使用 python 脚本预处理数据以将字符信息更改为整数的示例。该脚本是针对特定数据集编写的，但也许您可以将其用作数据的指南。

关于通过 read.big.matrix 读取 R 中的大数据，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/12725603/

通过 read.big.matrix 读取 R 中的大数据

上一篇：zend-framework2 - Zend框架2 : Cannot attach to 'dispatch' event

下一篇：npm - 软件包未安装 - "Error: tunneling socket could not be established"