数据示例:
X <- as.matrix(c("2019.01.01 (TUE) A STADIUM [spectator : 4000]", "2019.01.01 (TUE) C STADIUM [spectator : 3600]", "2018.01.02 (WED) B STADIUM [spectator : 2800]", "2019.01.02 (WED) D STADIUM [spectator : 3500]"))
X
[,1]
[1,] 2019.01.01 (TUE) A STADIUM [spectator : 4000]
[2,] 2019.01.01 (TUE) C STADIUM [spectator : 3600]
[3,] 2018.01.02 (WED) B STADIUM [spectator : 2800]
[4,] 2019.01.02 (WED) D STADIUM [spectator : 3500]
我想分隔此数据框的第 3 列或第 4 列。像这样:
Day Day2 STADIUM Spectator
1 2019.01.01 TUE A STADIUM 4000
2 2019.01.01 TUE C STADIUM 3600
3 2018.01.02 WED B STADIUM 2800
4 2019.01.02 WED D STADIUM 3500
我尝试过的:
str_split
返回列表。所以我使用了 str_split_fixed
。它需要 n 值。我分配了n = 4。但它把标点符号分开了。
str_split_fixed(X, n = 4, '[[:punct:]]')
[,1] [,2] [,3] [,4]
[1,] "2019" "01" "01 " "TUE) A STADIUM [spectator : 4000]"
[2,] "2019" "01" "01 " "TUE) C STADIUM [spectator : 3600]"
[3,] "2018" "01" "02 " "WED) B STADIUM [spectator : 2800]"
[4,] "2019" "01" "02 " "WED) D STADIUM [spectator : 3500]"
最佳答案
我们可以通过定义要提取的捕获组来使用 tidyr::extract
tidyr::extract(data.frame(X), X, into = c("Day", "Day2", "Stadium", "Spectator"),
regex = "(.*)\\((.*)\\).*([A-Z]+ STADIUM).*spectator : (\\d+)")
# Day Day2 Stadium Spectator
#1 2019.01.01 TUE A STADIUM 4000
#2 2019.01.01 TUE C STADIUM 3600
#3 2018.01.02 WED B STADIUM 2800
#4 2019.01.02 WED D STADIUM 3500
我们在这里定义了 4 个捕获组。
1) 从文本的开头开始,直到遇到左圆括号。
2) 左圆括号和右圆括号之间的文本。
3) [A-Z]
中的一个或多个字符后跟单词 "STADIUM"
4)单词“spectator”
后的数字。
关于r - 如何分隔字符值以在矩阵中除以正则表达式,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/58058093/