假设我有这样一个字符串:
Output:
I have some-non-alphanumeric % characters remain here, I "also, have_+ some & .here"
我想仅删除引号内的非字母数字字符逗号、句号或空格除外:
Desired Output:
I have some-non-alphanumeric % characters remain here, I "also, have some .here"
我尝试了以下 sed
命令匹配字符串并删除引号内的内容,但它删除了引号内的所有内容,包括引号:
sed '/characters/ s/\("[^"]*\)\([^a-zA-Z0-9\,\. ]\)\([^"]*"\)//g'
感谢任何帮助,最好使用 sed
,以获得所需的输出。提前致谢!
最佳答案
您需要多次重复替换以删除所有非字母数字字符。在 sed 中执行这样的循环需要标签并使用 b
和 t
命令:
sed '
# If the line contains /characters/, just to label repremove
/characters/ b repremove
# else, jump to end of script
b
# labels are introduced with colons
:repremove
# This s command says: find a quote mark and some stuff we do not want
# to remove, then some stuff we do want to remove, then the rest until
# a quote mark again. Replace it with the two things we did not want to
# remove
s/\("[a-zA-Z0-9,. ]*\)[^"a-zA-Z0-9,. ][^"a-zA-Z0-9,. ]*\([^"]*"\)/\1\2/
# The t command repeats the loop until we have gotten everything
t repremove
'
(即使没有 [^"a-zA-Z0-9,. ]*
也能正常工作,但在连续包含许多非字母数字字符的行上速度会变慢)
虽然另一个答案是正确的,因为在 perl 中这样做要容易得多。
关于regex - sed:仅删除引号内的所有非字母数字字符,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/28144073/