Linux 文本文件操作

我有一个格式的文件:

<a href="http://www.wowhead.com/?search=Superior Mana Oil">  
<a href="http://www.wowhead.com/?search=Tabard of Brute Force">  
<a href="http://www.wowhead.com/?search=Tabard of the Wyrmrest Accord">  
<a href="http://www.wowhead.com/?search=Tattered Hexcloth Sack">

我需要选择 = 之后但 "之前的文本，并在行尾打印它，添加它，例如:

<a href="http://www.wowhead.com/?search=Superior Mana Oil">Superior Mana Oil</a>  
<a href="http://www.wowhead.com/?search=Tabard of Brute Force">Tabard of Brute Force</a>  
<a href="http://www.wowhead.com/?search=Tabard of the Wyrmrest Accord">Tabard of the   Wyrmrest Accord</a>  
<a href="http://www.wowhead.com/?search=Tattered Hexcloth Sack">Tattered Hexcloth Sack</a>

我不确定通过 linux 命令行执行此操作的最佳方法(我猜可能是 sed/awk，但不太适合它们)，理想情况下我会喜欢一个脚本，我可以只输入文件名，例如./fixlink.sh 断链.txt

最佳答案

假设您可以在<a 之后拥有一个或多个空间, 以及 = 周围的零个或多个空格迹象，以下应该工作:

$ cat in.txt
<a href="http://www.wowhead.com/?search=Superior Mana Oil">
<a href="http://www.wowhead.com/?search=Tabard of Brute Force">
<a href="http://www.wowhead.com/?search=Tabard of the Wyrmrest Accord">
<a href="http://www.wowhead.com/?search=Tattered Hexcloth Sack">
#
# The command to do the substitution
#
$ sed -e 's#<a[ \t][ \t]*href[ \t]*=[ \t]*".*search[ \t]*=[ \t]*\([^"]*\)">#&\1</a>#' in.txt
<a href="http://www.wowhead.com/?search=Superior Mana Oil">Superior Mana Oil</a>
<a href="http://www.wowhead.com/?search=Tabard of Brute Force">Tabard of Brute Force</a>
<a href="http://www.wowhead.com/?search=Tabard of the Wyrmrest Accord">Tabard of the Wyrmrest Accord</a>
<a href="http://www.wowhead.com/?search=Tattered Hexcloth Sack">Tattered Hexcloth Sack</a>

如果您确定没有多余的空格，则该模式可简化为:

s#<a href=".*search=\([^"]*\)">#&\1</a>#

在sed , s后跟任何字符(在本例中为 #)开始替换。被替换的模式直到同一个字符的第二次出现。因此，在我们的第二个示例中，要替换的模式是:<a href=".*search=\([^"]*\)"> .我用了\([^"]*\)意思是，任何非 " 的序列字符，并将其保存在反向引用中 \1 (\(\) 对表示反向引用)。最后，下一个由 # 分隔的标记是替代品。 &在sed代表“任何匹配的”，在本例中是整行，\1只匹配链接文本。

又是这个模式:

's#<a[ \t][ \t]*href[ \t]*=[ \t]*".*search[ \t]*=[ \t]*\([^"]*\)">#&\1</a>#'

及其解释:

'                       quote so as to avoid shell interpreting the characters
s                       substitute
#                       delimiter
<a[ \t][ \t]*           <a followed by one or more whitespace
href[ \t][ \t]*=[ \t]*  href followed by optional space, = followed by optional space
".*search[ \t]*=[ \t]*  " followed by as many characters as needed, followed by
                        search, optional space, =, followed by optional space
\([^"]*\)               a sequence of non-" characters, saved in \1
">                      followed by ">
#                       delimiter, replacement pattern starts
&\1                     the matched pattern, followed by backreference \1.
</a>                    end the </a> tag
#                       end delimiter
'                       end quote

如果你真的确定总会有search=然后是你想要的文字，你可以这样做:

$ sed -e 's#.*search=\(.*\)">#&\1</a>#'

希望对您有所帮助。

关于Linux 文本文件操作，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/2101004/

Linux 文本文件操作

上一篇：python - 在 linux 中模拟设备驱动程序崩溃。让python重新加载它

下一篇：linux - 循环在第一次迭代后退出