regex - 为什么这个工作正则表达式不能与 sed 一起工作?

标签 regex sed

我有这种类型的文本:

Song of Solomon 1:1: The song of songs, which is Solomon’s.
John 3:16:For God so loved the world, that he gave his only begotten Son, that whosoever believeth in him should not perish, but have everlasting life.
III John 1:8: We therefore ought to receive such, that we might be fellowhelpers to the truth.

我正在尝试删除这节经文(或元数据,如果你愿意)并只获取纯文本内容。示例文本显示了三种不同类型的诗句(多字、单字和罗马+字),我认为从每行开头检测到“number:number:”之前的任何内容会更容易 ,然后用“”(空字符串)替换它。

我测试了一个似乎有效的正则表达式(正如我所描述的):

  1. 首先查找“number:number:”,排除它[或: .+?(?=(\s+)(\d+)(:)(\d+)(:))],
  2. 然后添加“number:number:”模式 [或: (\s+)(\d+)(:)(\d+)(:)]

这会导致以下正则表达式:

.+?(?=(\s+)(\d+)(:)(\d+)(:))(\s+)(\d+)(:)(\d+)(:)

正则表达式似乎工作正常,你可以尝试一下 here ,问题是当我尝试将正则表达式与 sed 一起使用时,它不起作用:

$ sed 's/.+?(?=(\s+)(\d+)(:)(\d+)(:))(\s+)(\d+)(:)(\d+)(:)//g' testcase.txt

当它应该生成时,它将生成与输入相同的文本:

 The song of songs, which is Solomon’s.
For God so loved the world, that he gave his only begotten Son, that whosoever believeth in him should not perish, but have everlasting life.
 We therefore ought to receive such, that we might be fellowhelpers to the truth.

请问有什么帮助吗?

非常感谢!

最佳答案

这个awk应该做:

awk -F": *" '{print $3}' file
The song of songs, which is Solomon.s.
For God so loved the world, that he gave his only begotten Son, that whosoever believeth in him should not perish, but have everlasting life.
We therefore ought to receive such, that we might be fellowhelpers to the truth.

为了使 number:number: 更安全,请使用以下命令:

awk -F"[0-9]+:[0-9]+: *" '{print $2}' file
The song of songs, which is Solomon.s.
For God so loved the world, that he gave his only begotten Son, that whosoever believeth in him should not perish, but have everlasting life.
We therefore ought to receive such, that we might be fellowhelpers to the truth.

这也可以防止文本中出现 : 问题。

使用 Adams 正则表达式,我们可以缩短它一些。

awk -F"([0-9]+:){2} ?" '{print $2}' file

awk -F"([0-9]+:){2} ?" '{$0=$2}1' file

关于regex - 为什么这个工作正则表达式不能与 sed 一起工作?,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/28639112/

相关文章:

sed - 文件中的 DOS 到 UNIX 路径替换

javascript - 为什么这个 javascript url 验证器失败?

html - 嵌套标签的正则表达式(最里面使其更容易)

regex - Grep。不以 “abcd”结尾的文本行?

python - 重新搜索特定年份之后的字符串中的日期

linux - 如何根据文件中的最少行数删除前 X 行

regex - 我怎样才能使这个 sed 捕获完成更复杂的替换

匹配等于或大于 1 的正则表达式,增量为 0.5

sed - awk, sed : one liner command for removing spaces from _all_ file names in a given folder?

linux - 如何从apk版本命令返回的结果中去掉\n