我正在读取 R 中的文本文件,并且想要替换每 3 次出现的“|”带有 '\n' 这是我的代码和输入数据
**Input Data**
======================
'Monday, November 2, 2015|10:21:27|17:58:12|Tuesday, November 3, 2015|10:13:09|18:52:44|Wednesday, November 4, 2015|10:11:52|18:40:36|Thursday, November 5, 2015|10:31:42|18:16:57|Friday, November 6, 2015|10:13:13|--|Saturday, November 7, 2015|--|--|Sunday, November 8, 2015|--|--|Monday, November 9, 2015|--|--|Tuesday, November 10, 2015|10:03:20|18:07:52|Wednesday, November 11, 2015|09:40:20|18:42:20|Thursday, November 12, 2015|10:38:56|18:37:20|Friday, November 13, 2015|10:45:26|18:09:54|Saturday, November 14, 2015|--|--|Sunday, November 15, 2015|--|--|Monday, November 16, 2015|--|--|Tuesday, November 17, 2015|10:11:43|18:36:15|Wednesday, November 18, 2015|--|--|Thursday, November 19, 2015|--|--|Friday, November 20, 2015|12:14:25|20:25:08|Saturday, November 21, 2015|--|--|Sunday, November 22, 2015|--|--|Monday, November 23, 2015|10:08:08|17:57:35|Tuesday, November 24, 2015|14:30:32|--|'
**My R-Code**
====================
emp <- readChar(FileDir, (file.info(FileDir)$size-172))
emp <- gsub("\r\n","|",emp)
empTMP <- gsub('([^|]*|[^|]*|[^|]*)|',"\1\n",emp)
**output**
====================
"\001\n|\001\n|\001\n|\001\n|\001\n|\001\n|\001\n|\001\n|\001\n|\001\n|\001\n|\001\n|\001\n|\001\n|\001\n|\001\n|\001\n|\001\n|\001\n|\001\n|\001\n|\001\n|\001\n|\001\n|\001\n|\001\n|\001\n|\001\n|\001\n|\001\n|\001\n|\001\n|\001\n|\001\n|\001\n|\001\n|\001\n|\001\n|\001\n|\001\n|\001\n|\001\n|\001\n|\001\n|\001\n|\001\n|\001\n|\001\n|\001\n|\001\n|\001\n|\001\n|\001\n|\001\n|\001\n|\001\n|\001\n|\001\n|\001\n|\001\n|\001\n|\001\n|\001\n|\001\n|\001\n|\001\n|\001\n|\001\n|\001\n|\001\n|\001\n|\001\n|\001\n"
**Required output**
====================
Monday, November 2, 2015|10:21:27|17:58:12
Tuesday, November 3, 2015|10:13:09|18:52:44
Wednesday, November 4, 2015|10:11:52|18:40:36
Thursday, November 5, 2015|10:31:42|18:16:57
Friday, November 6, 2015|10:13:13|--
Saturday, November 7, 2015|--|--
Sunday, November 8, 2015|--|--
Monday, November 9, 2015|--|--
Tuesday, November 10, 2015|10:03:20|18:07:52
Wednesday, November 11, 2015|09:40:20|18:42:20
Thursday, November 12, 2015|10:38:56|18:37:20
Friday, November 13, 2015|10:45:26|18:09:54
Saturday, November 14, 2015|--|--
Sunday, November 15, 2015|--|--
Monday, November 16, 2015|--|--
Tuesday, November 17, 2015|10:11:43|18:36:15
Wednesday, November 18, 2015|--|--
Thursday, November 19, 2015|--|--
Friday, November 20, 2015|12:14:25|20:25:08
Saturday, November 21, 2015|--|--
Sunday, November 22, 2015|--|--
Monday, November 23, 2015|10:08:08|17:57:35
Tuesday, November 24, 2015|14:30:32|--
请帮助我做错了什么,我在文本编辑器中检查了上面的正则表达式,它工作得很好,但是在“R”中它没有产生正确的结果。
最佳答案
以下作品:
#input <- #your input string
x <- strsplit(input, split = "|", fixed = TRUE)[[1L]]
idx <- seq(3L, length(x), by = 3L)
x[idx] <- paste0(x[idx], "\n")
x[-idx] <- paste0(x[-idx], "|")
paste(x, collapse = "")
或者在一个命令中:
paste(paste0(x <- strsplit(input, split = "|", fixed = TRUE)[[1L]],
rep_len(c("|", "|", "\n"), length(x))), collapse = "")
如果您想坚持使用 gsub
,这也有效:
gsub("([^|]*\\|[^|]*\\|[^|]*)\\|", "\\1\n", input)
gsub(paste0("(", #start capturing group 1
"[^|]*", #Matching anything but | 0 or more times
"\\|", #Match | (must escape because it's reserved for OR)
"[^|]*\\|", #again
"[^|]*", #again matching anything but |
")", #end captured group
"\\|"), #captured group is followed by a third |
"\\1\n",input) #replace match with captured group followed by \n
# (instead of |)
(刚刚注意到您最初的尝试非常接近。只是您忘记了正确转义:“\\1”
,而不是“\1”
,并且"|"
是保留的,所以我们也必须转义。@CAFEBABE 是对的,这似乎更适合 awk
...)
关于regex - R 中第 n 次出现的正则表达式,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/34571435/