html - 使用 bash 从 html 表中提取单元格值

我正在尝试创建一个 BASH/Perl 脚本，该脚本将从动态 html 表中获取特定值。

这是我的页面示例


<table border="1" bordercolor="#FFCC00" style="background-color:#FFFFCC" width="100%" cellpadding="3" cellspacing="3">

<tr align="center">

<th>Environment</th><th>Release Track</th><th>Artifact</th><th>Name</th><th>Build #</th><th>Cert Idn</th><th>Build Idn</th><th>Request Status</th><th>Update Time</th><th>Log Info.</th><th>Initiator</th>

</tr>

<tr>
<td>DEV03</td><td>2.1.0</td><td>abpa</td><td>ecom-abpa-ear</td><td>204</td><td>82113</td><td>171242</td><td>Deployed</td><td>3/18/2013 3:10:58 PM</td><td width="70">Log info</a></td><td>CESAR</td>
</tr>

<tr>
<td>DEV03</td><td>2.1.0</td><td>abpa</td><td>abpa_dynamic_config_properties</td><td>20</td><td>82113</td><td>167598</td><td>Deployed</td><td>3/18/2013 2:32:27 PM</td><td width="70">Log info</a></td><td>CESAR</td>

</tr>

</table>

My goal is to get this value from this cell.

"Deployed"

Another way to look at it...

Retrieve all data under the "Request Status" column

The value "Deployed" is dynamic and could change.

I have tried the following:

sed -e 's/>/>\n/g' abpa_cesar_status.txt | egrep -i "^\s*[A-Z]+&lt;/td&gt;
" | sed -e 's|&lt;/td&gt;||g' | grep Deployed

但那只是针对“已部署”的 greps

有什么想法吗？

最佳答案

你应该使用像xmllint这样的解析器做这个。

使用 xmllint，您可以根据 xpath 提取元素。

例如:

$ xmllint --html --format --shell file.html <<< "cat //table/tr/td[position()=8]/text()"
/ >  -------
Deployed
 -------
Deployed
/ >

上述命令中的 xpath //table/tr/td[position()=8]/text() 返回表第 8 列的值。

关于html - 使用 bash 从 html 表中提取单元格值，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/15505171/

html - 使用 bash 从 html 表中提取单元格值

上一篇：html - ul >li 样式并保持在一行内

下一篇：Javascript - 悬停时 addClass() 不起作用