regex - sed:仅删除引号内的所有非字母数字字符

标签 regex bash sed alphanumeric non-alphanumeric

假设我有这样一个字符串:

Output:   
I have some-non-alphanumeric % characters remain here, I "also, have_+ some & .here"

我想仅删除引号内的非字母数字字符逗号、句号或空格除外:

Desired Output:    
I have some-non-alphanumeric % characters remain here, I "also, have some  .here"

我尝试了以下 sed 命令匹配字符串并删除引号内的内容，但它删除了引号内的所有内容，包括引号:

sed '/characters/ s/\("[^"]*\)\([^a-zA-Z0-9\,\. ]\)\([^"]*"\)//g'

感谢任何帮助，最好使用 sed，以获得所需的输出。提前致谢!

最佳答案

您需要多次重复替换以删除所有非字母数字字符。在 sed 中执行这样的循环需要标签并使用 b 和 t 命令:

sed '
# If the line contains /characters/, just to label repremove
/characters/ b repremove
# else, jump to end of script
b
# labels are introduced with colons
:repremove
# This s command says: find a quote mark and some stuff we do not want
# to remove, then some stuff we do want to remove, then the rest until
# a quote mark again. Replace it with the two things we did not want to
# remove
s/\("[a-zA-Z0-9,. ]*\)[^"a-zA-Z0-9,. ][^"a-zA-Z0-9,. ]*\([^"]*"\)/\1\2/
# The t command repeats the loop until we have gotten everything
t repremove
'

(即使没有 [^"a-zA-Z0-9,. ]* 也能正常工作，但在连续包含许多非字母数字字符的行上速度会变慢)

虽然另一个答案是正确的，因为在 perl 中这样做要容易得多。

关于regex - sed:仅删除引号内的所有非字母数字字符，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/28144073/

上一篇：string - 在 bash 脚本中 : how to use printf to print a unique string containing some\n\t

下一篇：bash - 在 bash 中，函数内部的 heredoc 返回语法错误

相关文章：

java - N个数字的字符串

ruby - 只允许破折号和数字的正则表达式？

bash - exim4-config 脚本自动化了吗？

bash - 如何测试位置是否是 btrfs 子卷？

csv - 替换 csv 中多列中的字符串

python - 删除字符串内的多个子字符串

regex - 使用正则表达式和vba，提取部分数据

linux - 在 linux shell 脚本中传递参数和使用命名参数时避免位置引用

ruby - 当前行以 ^M 结尾时追加文件中的下一行

sed:删除与给定字段中的模式匹配的行