r - 自动转换 data.frame 列

标签 r dataframe

我对一种自动将由因子列(如 df)组成的数据框转换为最佳可能类型的方法感兴趣,类似于 read.table 创建的内容(如 df2)。一种可能是将数据帧写入字符串并使用 read.table 将其读回。还有其他的吗?

> df <- data.frame(a=c(" 1"," 2", " 3"),b=c("a","b","c"),c=c(" 1.0", "NA", " 2.0"),d=c(" 1", "B", "2"))
> str(df)
'data.frame':   3 obs. of  4 variables:
 $ a: Factor w/ 3 levels " 1"," 2"," 3": 1 2 3
 $ b: Factor w/ 3 levels "a","b","c": 1 2 3
 $ c: Factor w/ 3 levels " 1.0"," 2.0",..: 1 3 2
 $ d: Factor w/ 3 levels " 1","2","B": 1 3 2
> df2 <- with(df, data.frame(a=as.integer(a),b=b,c=as.numeric(c),d=as.character(d), stringsAsFactors=FALSE))
> str(df2)
'data.frame':   3 obs. of  4 variables:
 $ a: int  1 2 3
 $ b: Factor w/ 3 levels "a","b","c": 1 2 3
 $ c: num  1 3 2
 $ d: chr  " 1" "B" "2"

最佳答案

使用read.table使用的函数:type.convert

示例:

df <- data.frame(a=c(" 1"," 2", " 3"), b=c("a","b","c"), 
                 c=c(" 1.0", "NA", " 2.0"), d=c(" 1", "B", "2"))
str(df)
# 'data.frame':  3 obs. of  4 variables:
#  $ a: Factor w/ 3 levels " 1"," 2"," 3": 1 2 3
#  $ b: Factor w/ 3 levels "a","b","c": 1 2 3
#  $ c: Factor w/ 3 levels " 1.0"," 2.0",..: 1 3 2
#  $ d: Factor w/ 3 levels " 1","2","B": 1 3 2
df[] <- lapply(df, function(y) type.convert(as.character(y)))
df
#   a b  c  d
# 1 1 a  1  1
# 2 2 b NA  B
# 3 3 c  2  2
str(df)
# 'data.frame':  3 obs. of  4 variables:
#  $ a: int  1 2 3
#  $ b: Factor w/ 3 levels "a","b","c": 1 2 3
#  $ c: num  1 NA 2
#  $ d: Factor w/ 3 levels " 1","2","B": 1 3 2

(但我不确定这是否是您正在寻找的......)


更新:如果你想创建一个colClasses类型的函数,也许你可以尝试这样的函数。与您的问题标题不同,这不是“自动”,但它确实允许您指定每列的列类,而不是由 type.convert 来决定。

toColClasses <- function(inDF, colClasses) {
  if (length(colClasses) != length(inDF)) stop("Please specify colClasses for each column")
  inDF[] <- lapply(seq_along(colClasses), function(y) {
    if (colClasses[y] == "") inDF[y] <- inDF[[y]]
    else {
      FUN <- match.fun(colClasses[y])
      inDF[y] <- suppressWarnings(FUN(as.character(inDF[[y]])))
    }
  })
  inDF
}

您可以按如下方式使用它:

df <- data.frame(a = c(" 1"," 2", " 3"), b = c("a","b","c"), 
                 c = c(" 1.0", "NA", " 2.0"), d = c(" 1", "B", "2"))

df2 <- toColClasses(df, c("as.integer", "", "as.numeric", "as.character"))
df2
#   a b  c  d
# 1 1 a  1  1
# 2 2 b NA  B
# 3 3 c  2  2
str(df2)
# 'data.frame':  3 obs. of  4 variables:
#  $ a: int  1 2 3
#  $ b: Factor w/ 3 levels "a","b","c": 1 2 3
#  $ c: num  1 NA 2
#  $ d: chr  " 1" "B" "2"

不过,您必须对该函数做更多的工作才能使其接受更广泛的 as... 函数。

关于r - 自动转换 data.frame 列,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/18893571/

相关文章:

python - 获取数据帧的非空值作为单列

r - 比例尺错误。默认 : length of 'center' must equal the number of columns of 'x'

r - 对 data.frame 或矩阵中的行求和

r - 在 R 中跨多个列表应用函数

r - 为大型数据库中的每个标识符添加由 0 组成的行

从网络读取文件名列表到 R

r - 合并数据框中的重复行并创建新列

python-3.x - 在python函数中返回数据框

r - 如何从 R Matrix 库访问稀疏矩阵的一些元素?

r - 如何测试一对元素是否在数据框中?