r - 如何在R中将文本文件转换为数据帧?

标签 r mongodb dataframe na read.table

我正在尝试从 mongodb 导出数据点。不幸的是,我无法将其直接连接到 rstudio。因此,根据查询结果,我创建了一个文本文件,并尝试将其作为 R 中的文本文件读取。

"cityid", "count"
"102","2"
"55","31"
"119","7"
"206","1"
"18","2"
"15","1"
"32","3"
"14","1"
"54","2"
"23","85"
"158","3"
"266","1"
"9","1"
"34","1"
"159","1"
"31","1"
"22","2"
"209","2"
"121","4"
"73","12"
"350","2"
"311","2"
"377","2"
"230","7"
"290","1"
"49","2"
"379","2"
"75","1"
"59","6"
"165","3"
"19","8"
"13","40"
"126","13"
"243","12"
"325","1"
"17","1"
"null","235"
"144","2"
"334","1"
"40","12"
"7","34"
"181","40"
"349","4"

所以基本上格式就像上面一样,我想将其转换成一个数据框,我可以将其作为其他数据集计算的引用。

这就是我试图制作的数据框......

L <- readLines(file.choose())
L.df <- as.data.frame(L)

list <- strsplit(L.df, ",")
library("plyr")
df <- ldply(list)
colnames(df) <- c("city_id", "count")
str(df)
df$city_id <- suppressWarnings(as.numeric(as.character(df$city_id)))

在最后一行,我尝试将字符值转换为数值,结果失败并将它们强制转换为 NA。

有人有更好的建议将它们制作为数值表吗? 或者实际上是否有更好的方法将 mongodb 引入 R 而无需将它们复制并粘贴为文本文件?我成功使用 Rmongo 连接到 mongodb,但语法太复杂,我无法理解。我使用的查询是:

db.getCollection('logging_app_location_view_logs').aggregate([
{"$group": {"_id": "$city_id", "total": {"$sum":1}}}
]).forEach(function(l){

  print('"' + l._id + '","' + l.total + '"');

});

预先感谢您的帮助!

最佳答案

当您已在 read.table 函数中传递 header = TRUE 时,无需再次指定列名称。 colClasses 参数将处理列数据的类。

df <- read.table(file.choose(), header = TRUE, sep = ",", colClasses = c('character', 'character'), na.strings = 'null')

# convert character to numeric format
char_cols <- which(sapply(df, class) == 'character')  # identify character columns
df[char_cols] <- lapply(df[char_cols], as.numeric)   # convert character to numeric column

关于r - 如何在R中将文本文件转换为数据帧?,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/41643176/

相关文章:

r - 在 R 编程神经网络中为 nnet 指定 "initial weights"

r - 如何修改多级嵌套 R 列表中的元素?

r - 有条件删除面板数据

python - Pandas - 获取当前行,将值与前 X 行进行比较并返回匹配项数(在 x% 范围内)

R - 添加具有几乎相同名称的列并使用正确的列名称保存

regex - R正则表达式

MongoDB 文本搜索和排序依据

mongodb - mongoengine 中的 find() 和 findOne()

c# - ModelState.IsValid 在使用 MongoDB 时包含错误

r - 基于两列的正负一致性突变新列