r - 同一文件输入的多个分隔符 R

标签 r csv import delimiter separator

我一直在寻找答案,但只找到了与 C 或 C# 相关的内容。
我意识到 R 的大部分内容是用 C 编写的,但我对它的了解并不存在。
我对 R 也比较陌生。
我正在使用当前的 Rstudio。

我想这和我想要的很相似。
Read the data efficiently with multiple separating lines in R

我有一个 csv 文件,但一个变量是一个字符串,其值由 _ 分隔和 -我想知道是否有一个包或额外的代码可以在阅读时执行以下操作。命令。

"1","Client1","Name2","*Name3_Name1_KB_MobApp_M-13-44_AU_PI Likes by KB_ANDROID","2013-08-31 13:39:55.0","2013-10-16 13:58:00.0",0,218,4,93,1377907200000
"2","Client1","Name2","*Name3_Name1_KB_MobApp_M-13-44_AU_PI Likes by KB_ANDROID","2013-08-31 13:39:55.0","2013-10-16 13:58:00.0",0,390,5,157,1377993600000
"3","Client1","Name2","*Name3_Name1_KB_MobApp_M-13-44_AU_PI Likes by KB_ANDROID","2013-08-31 13:39:55.0","2013-10-16 13:58:00.0",0,376,5,193,1.37808e+12
"4","Client1","Name2","*Name3_Name1_KB_MobApp_M-13-44_AU_PI Likes by KB_ANDROID","2013-08-31 13:39:55.0","2013-10-16 13:58:00.0",1,35,1,15,1377907200000
"5","Client1","Name2","*Name3_Name1_KB_MobApp_M-13-44_AU_PI Likes by KB_ANDROID","2013-08-31 13:39:55.0","2013-10-16 13:58:00.0",12,11258,117,2843,1377993600000
"6","Client1","Name2","*Name3_Name1_KB_MobApp_M-13-44_AU_PI Likes by KB_ANDROID","2013-08-31 13:39:55.0","2013-10-16 13:58:00.0",5,4659,56,1826,1.37808e+12
"7","Client1","Name2","*Name3_Name1_KB_MobApp_M-13-44_AU_PI Likes by KB_ANDROID","2013-08-31 13:39:55.0","2013-10-16 13:58:00.0",7,7296,136,2684,1377907200000
"8","Client1","Name2","*Name3_Name1_KB_MobApp_M-13-44_AU_PI Likes by KB_IOS_IPAD","2013-08-31 13:18:21.0","2013-10-16 13:58:00.0",0,4533,35,1632,1377907200000
"9","Client1","Name2","*Name3_Name1_KB_MobApp_M-13-44_AU_PI Likes by KB_IOS_IPAD","2013-08-31 13:18:21.0","2013-10-16 13:58:00.0",0,421,6,161,1377993600000
"10","Client1","Name2","*Name3_Name1_KB_MobApp_M-13-44_AU_PI Likes by KB_IOS_IPAD","2013-08-31 13:18:21.0","2013-10-16 13:58:00.0",0,57,2,23,1.37808e+12

示例行:
Name    Name1   *XYZ_Name3_KB_MobApp_M-18-25_AU_PI ANDROID  2013-09-32 14:39:55.0   2013-10-16 13:58:00.0   0   218 4   93  1377907200000

所以它很容易阅读
results <- read.delim("~/results", header=F)

但后来我仍然有字符串 *XYZ_Name3_KB_MobApp_M-18-25_AU_PI

所需的输出(由 _- 分隔):
Name    Name1   *XYZ   Name3  KB   MobApp   M 18 25  AU  PI ANDROID 2013-09-32 14:39:55.0   2013-10-16 13:58:00.0   0   218 4   93  1377907200000

但不拆分时间字符串。

---- 感谢@Henrik 和@AnandaMahto 提供代码和包。 ----
library(splitstackshape)

# split concatenated column by `_`
df4 <- concat.split(data = df3, split.col = "V3", sep = "_", drop = TRUE)

# split the remaining concatenated part by `-`
df5 <- concat.split(data = df4, split.col = "V3_5", sep = "-", drop = TRUE)

最佳答案

尝试这个:

# dummy data
df <- read.table(text="
Name    Name1   *XYZ_Name3_KB_MobApp_M-18-25_AU_PI ANDROID  2013-09-32 14:39:55.0   2013-10-16 13:58:00.0   0   218 4   93  1377907200000
Name    Name2   *CCC_Name3_KB_MobApp_M-18-25_AU_PI ANDROID  2013-09-32 14:39:55.0   2013-10-16 13:58:00.0   0   218 4   93  1377907200000
", as.is = TRUE)

# replace "_" to "-"
df_V3 <- gsub(pattern="_", replacement="-", df$V3, fixed = TRUE)

# strsplit, make dataframe
df_V3 <- do.call(rbind.data.frame, strsplit(df_V3, split = "-"))

# output, merge columns
output <- cbind(df[, c(1:2)],
                df_V3,
                df[, c(4:ncol(df))])

基于下面的评论,这是另一个相关选项,但使用 read.table而不是 strsplit .
splitCol <- "V3"
temp <- read.table(text = gsub("-", "_", df[, splitCol]), sep = "_")
names(temp) <- paste(splitCol, seq_along(temp), sep = "_")
cbind(df[setdiff(names(df), splitCol)], temp)

关于r - 同一文件输入的多个分隔符 R,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/20075135/

相关文章:

r - 具有多个条件的 for 循环的向量化

r - 使用 ddply 时保留有序因子

r - 将逗号分隔的字符串转换为 R 中的整数

Ubuntu中的Python导入模块和文件

sql - 将 Excel 电子表格数据导入现有的 SQL 表?

html - 将带有线框的 3D 表面从 R 导出到 HTML

python - 如何将键值管道分隔文件转换为带标题的完美 csv 文件

excel - Access 2013 中的类型转换失败

java - csv 行中的多个部分

matlab - Matlab,从csv文件读取多个2d数组