我的文本文件中有一个数据,其中包含几列,我想以不丢失任何信息的方式处理数据,有些可能包含两个或多个用特殊字符(例如“+”加号)分隔的信息标志,我想将这些组合信息放在同一列的不同行中,例如我在下面粘贴了数据
我的数据框如下所示
df <- data.frame(G1=c("GH13_22+CBM4", "GH109+PL7+GH9","GT57", "AA3","",""),
G2=c("GH13_22","","GT57+GH15","AA3", "GT41","PL+PL2"),
G3=c("GH13", "GH1O9","", "CBM34+GH13+CBM48", "GT41","GH16+CBM4+CBM54+CBM32"))
G1 G2 G3
1 GH13_22+CBM4 GH13_22 GH13
2 GH109+PL7+GH9 GH1O9
3 GT57 GT57+GH15
4 AA3 AA3 CBM34+GH13+CBM48
5 GT41 GT41
6 PL+PL2 GH16+CBM4+CBM54+CBM32
预期结果应如下所示
df2 <- data.frame(G1=c("GH13_22","CBM4", "GH109","PL7","GH9","GT57", "AA3","","","",""),
G2=c("GH13_22","","GT57","GH15","AA3", "GT41","PL","PL2","","",""),
G3=c("GH13", "GH1O9","", "CBM34","GH13","CBM48", "GT41","GH16","CBM4","CBM54","CBM32"))
G1 G2 G3
1 GH13_22 GH13_22 GH13
2 CBM4 GH1O9
3 GH109 GT57
4 PL7 GH15 CBM34
5 GH9 AA3 GH13
6 GT57 GT41 CBM48
7 AA3 PL GT41
8 PL2 GH16
9 CBM4
10 CBM54
11 CBM32
感谢任何帮助 谢谢
最佳答案
一个基本
解决方案:
split <- lapply(df, \(x) unlist(strsplit(replace(x, x == '', NA_character_), '\\+')))
as.data.frame(lapply(split, `[`, 1:max(lengths(split))))
G1 G2 G3
1 GH13_22 GH13_22 GH13
2 CBM4 <NA> GH1O9
3 GH109 GT57 <NA>
4 PL7 GH15 CBM34
5 GH9 AA3 GH13
6 GT57 GT41 CBM48
7 AA3 PL GT41
8 <NA> PL2 GH16
9 <NA> <NA> CBM4
10 <NA> <NA> CBM54
11 <NA> <NA> CBM32
关于r - 如果用特殊字符分隔,例如 R 中的 "+"符号,则为数据框中的项目创建新行,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/75670407/