我有一个格式文件:
FA01_01:The birch canoe slid on the smooth planks
FA01_02:Glue the sheet to the dark blue background
我需要它的形式(还要注意小写的使用):
<s> the birch canoe slid on the smooth planks </s> (FA01_01)
<s> glue the sheet to the dark blue background </s> (FA01_02)
所以我用 sed 尝试了以下表达式:
sed 's/\(.......\):\(.*$\)/(\1) <s> \2 <\/s>/' tmp.dat
但这是它返回的内容:
</s> (FA01_01)anoe slid on the smooth planks
</s> (FA01_02)eet to the dark blue background
无论出于何种原因,似乎 sed 导致替换的模式环绕到行的开头,但仅适用于第二个匹配项。例子:
$> sed 's/\(.......\):\(.*$\)/\1 \2/' tmp.dat
FA01_01 The birch canoe slid on the smooth planks
是正确的,但是
$>sed 's/\(.......\):\(.*$\)/\2 \1/' tmp.dat
FA01_01h canoe slid on the smooth planks
这甚至发生在 awk 中。为了测试环绕假设:
$> awk 'BEGIN{FS=":"}{print tolower($2) "XXX"}' tmp.dat
XXX birch canoe slid on the smooth planks
但
$> awk 'BEGIN{FS=":"}{print tolower($1) "XXX"}' tmp.dat
fa01_01XXX
任何想法会导致此换行?它与第二个模式或保存的列一直到行尾的事实有关吗?
最佳答案
原因是您的 tmp.dat 可能是 DOS 格式(带有\r 字符)。您可以尝试将其转换为 linux 格式(只有\n),例如使用以下命令:
dos2unix tmp.dat
然后执行:
sed 's/\(.......\):\(.*$\)/<s>\L \2 \E<\/s> (\1)/' tmp.dat
关于regex - sed 和 awk 导致换行,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/24893114/