r - 解析 R 中的多个分隔符和嵌入的大括号

标签 r dataframe

我从一个没有提前考虑数据分析的人那里继承了一个项目。因此,我输出的数据文件具有多个分隔符,包括多种类型的大括号和不同程度的分组数据嵌入,以及用大括号分隔数字的逗号。一些地方还提供了一些纯文本句子以供良好的衡量。

任何人都可以提供一种简单的方法来将嵌入的结构和轮廓转换为R中的数据框架吗?

这是一个示例:

[(3, None, 1), (1, 0.36, 1), (3, None, 1), (2, 0.41, 1), (5, 0.47, 1), (6, 0.36, 1), (2, 0.45, 1), (2, 0.36, 1), (4, 0.39, 1), (6, 0.34, 1), (1, 0.47, 1), (7, 0.44, 1), (4, 0.39, 1), (6, 0.38, 1), (9, 0.39, 1), (5, 0.37, 1), (8, 0.41, 1), (9, 0.38, 1), (1, 0.44, 1), (9, 0.38, 1), (4, 0.36, 1), (8, 0.41, 1), (7, 0.38, 1), (7, 0.41, 1), (7, 0.36, 1), (7, 0.39, 1), (9, 0.41, 1), (5, 0.36, 1), (8, 0.31, 1), (6, 0.38, 1), (1, 0.44, 1), (3, None, 1), (5, 0.59, 1), (7, 0.52, 1), (7, 0.44, 1), (7, 0.38, 1), (8, 0.34, 1), (9, 0.39, 1), (3, None, 1), (7, 0.44, 1), (7, 0.53, 1), (8, 0.36, 1), (3, 0.36, 0), (8, 0.34, 1), (5, 0.38, 1), (3, None, 1), (5, 0.52, 1), (3, None, 1), (9, 0.55, 1), (9, 0.36, 1), (4, 0.38, 1), (2, 0.73, 1), (9, 0.36, 1), (7, 0.44, 1), (4, 0.45, 1), (4, 0.62, 1), (9, 0.39, 1), (3, 0.31, 0), (1, 0.42, 1), (4, 0.34, 1), (5, 0.53, 1), (8, 0.34, 1), (3, None, 1), (8, 0.47, 1), (6, 0.39, 1), (1, 0.42, 1), (5, 0.53, 1), (1, 0.53, 1), (8, 0.62, 1), (1, 0.39, 1), (8, 0.44, 1), (8, 0.45, 1), (9, 0.38, 1), (1, 0.36, 1), (4, 0.38, 1), (6, 0.36, 1), (7, 0.36, 1), (9, 0.39, 1), (8, 0.41, 1), (8, 0.31, 1), (3, None, 1), (2, 0.36, 1), (4, 0.36, 1), (2, 0.31, 1), (9, 0.36, 1), (1, 0.31, 1), (4, 0.34, 1), (1, 0.56, 1), (7, 0.61, 1), (9, 0.38, 1), (3, None, 1), (1, 0.36, 1), (1, 0.53, 1), (5, 0.33, 1), (3, None, 1), (1, 0.39, 1), (6, 0.34, 1), (9, 0.33, 1), (4, 0.38, 1), (3, None, 1), (5, 0.44, 1), (2, 0.52, 1), (1, 0.42, 1), (6, 0.38, 1), (9, 0.33, 1), (4, 0.38, 1), (5, 0.31, 1), (6, 0.31, 1), (8, 0.31, 1), (2, 0.33, 1), (9, 0.33, 1), (1, 0.56, 1), (6, 0.38, 1), (3, None, 1), (7, 0.34, 1), (5, 0.34, 1), (2, 0.36, 1), (2, 0.47, 1), (3, None, 1), (2, 0.39, 1), (2, 0.36, 1), (6, 0.31, 1), (1, 0.53, 1), (5, 0.45, 1), (7, 0.42, 1), (5, 0.45, 1), (2, 0.39, 1), (2, 0.45, 1), (6, 0.36, 1), (2, 0.45, 1), (1, 0.39, 1), (1, 0.34, 1), (4, 0.39, 1), (2, 0.34, 1), (2, 0.31, 1), (3, 0.31, 0), (8, 0.39, 1), (6, 0.34, 1), (6, 0.31, 1), (5, 0.38, 1), (9, 0.34, 1), (7, 0.31, 1), (1, 0.33, 1), (4, 0.38, 1), (6, 0.38, 1), (5, 0.38, 1), (9, 0.38, 1), (2, 0.5, 1), (8, 0.44, 1), (8, 0.39, 1), (4, 0.38, 1), (5, 0.5, 1), (9, 0.48, 1), (2, 0.59, 1), (8, 0.41, 1), (7, 0.41, 1), (3, None, 1), (4, 0.5, 1), (4, 0.36, 1), (7, 0.38, 1), (5, 0.44, 1), (6, 0.34, 1), (6, 0.41, 1), (3, None, 1), (7, 0.39, 1), (6, 0.34, 1), (2, 0.34, 1), (9, 0.36, 1), (4, 0.36, 1), (5, 0.38, 1), (3, None, 1), (6, 0.36, 1), (5, 0.33, 1), (4, 0.44, 1), (7, 0.34, 1), (8, 0.48, 1), (6, 0.34, 1), (8, 0.38, 1), (3, None, 1), (4, 0.31, 1), (3, 0.31, 0)]
 Percentage of correctly suppressed responses per five-target section: 
[80, 80, 100, 80]
 Average reaction time per five-target section: 
[0.4, 0.43, 0.39, 0.39]
 Percentage of correctly suppressed responses per ten-target section: 
[80, 90]
 Average reaction time per ten-target section: 
[0.41, 0.39]

理想情况下,第一行将转换为 3 列数据框,第二行被忽略,第三行为 4 整数向量,等等。

最佳答案

使用 readLines 获取数据,然后使用 gsubstrsplit 将其全部排序:

#txt <- readLines(textConnection("<insert your text here>"))
#or probably more appropriately
txt <- readLines("filename.txt")  

# remove labels
txt <- txt[-c(2,4,6,8)]

# remove first [ character
txt <- lapply(txt,function(x) substr(x,2,nchar(x)-1))

# reformat element 1
txt[[1]] <- gsub("[()]","",txt[[1]])
txt[[1]] <- gsub("None","0",txt[[1]])
txt[[1]] <- as.numeric(unlist(strsplit(txt[[1]],",")))
txt[[1]] <- data.frame(matrix(txt[[1]],ncol=3,byrow=TRUE))

# reformat elements 2-5
txt[2:5] <- lapply(txt[2:5],function(x) as.numeric(unlist(strsplit(x,","))))

结果:

txt

#[[1]]
#  X1   X2 X3
#1  3 0.00  1
#2  1 0.36  1
#3  3 0.00  1
#4  2 0.41  1
#5  5 0.47  1
#6  6 0.36  1
# etc... etc...
#
#[[2]]
#[1]  80  80 100  80
#
#[[3]]
#[1] 0.40 0.43 0.39 0.39
#
#[[4]]
#[1] 80 90
#
#[[5]]
#[1] 0.41 0.39

关于r - 解析 R 中的多个分隔符和嵌入的大括号,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/20534982/

相关文章:

r - 在 R 图中显示值

r - 如何重复 Grubbs 测试并标记异常值

python - 我安装了 Pandas ,但它不工作

r - 在不对列进行排序的情况下从宽转换为长

python - 从自定义文本字段中提取年份和月份

用于 radialNetwork() 的 R networkD3 颜色节点描边

r - 将日期转换为特定格式,如月日、年

python - 提取括号之间的文本并为每个文本位创建行

r - 将逗号分隔的字符串拆分为 bool 列

R 聚合在函数中具有多个参数