html - xpath获取以特定字符或字符串开头的数据

我需要从以下代码中提取某些文本元素。

<div class="inhalt-links">
    <h2>
        Deutsche Verkehrswacht
        <br>
        Verkehrswacht Dortmund e. V.
        <br>
    </h2>
    <h3>
        Standnummer:&nbsp;
            <span style="font-weight: normal;">4.E08</span>
    </h3>
    <div class="clear"></div>
    <br>
    Benediktinerstraße 82
    <br>
    44287&nbsp;Dortmund
    <br>
    Deutschland
    <br>
    <br>
    Tel.:+49 231 447687
    <br>
    Fax:+49 231 447136
    <br>
    E-Mail:<a href="https://stackoverflow.com/cdn-cgi/l/email-protection" class="__cf_email__" data-cfemail="71181f171e310714031a1419030206101219055c151e03051c041f155f1514" rel="noreferrer noopener nofollow">[email protected]</a>
    <br>
    <a href="http://www.verkehrswacht-dortmund.de" class="url" target="_blank">www.verkehrswacht-dortmund.de</a>
    <br>
    <div class="social"></div>
    <br>
</div>

要提取电话号码:+49 231 447687，我可以使用 div[@class='inhalt-links']/text()[4]。对于传真、电子邮件、网站等其他详细信息，我只需要更改 text() 元素的位置号。但是，这些文本的位置有时会有不同的顺序，如以下代码所示:

<div class="inhalt-links">
    <h2>
        DEW21
        <br>
    </h2>
    <h3>
        Standnummer:&nbsp;
            <span style="font-weight: normal;">4.B56</span>
    </h3>
    <div class="clear"></div>
    <br>
    Günter-Samtlebe-Platz 1
    <br>
    44135&nbsp;Dortmund
    <br>
    Postfach:104141
    <br>
    44041&nbsp;Dortmund
    <br>
    Deutschland
    <br>
    <br>
    Tel.:+49 231 544-0
    <br>
    Fax:+49 231 544-1130
    <br>
    E-Mail:<a href="https://stackoverflow.com/cdn-cgi/l/email-protection" class="__cf_email__" data-cfemail="ceb8abbcbabca7abac8eaaabb9fcffe0aaab" rel="noreferrer noopener nofollow">[email protected]</a>
    <br>
    <a href="http://www.dew21.de" class="url" target="_blank">www.dew21.de</a>
    <br>
    <div class="social"></div>
    <br>
</div>

xpath div[@class='inhalt-links']/text()[4] 将选择文本“44041 Dortmund”而不是 Tel.:+49 231 544-0。是否有像 "div[@class='inhalt-links']/text[starts with "Tel.:"]" 这样的 xpath 来选择 Tel.: 元素？

最佳答案

" Is there any xpath like "//div[@class='inhalt-links']/text[starts with "Tel.:"]" to select the Tel.: element?"

当然，试试这个方法:

//div[@class='inhalt-links']/text()[starts-with(normalize-space(), 'Tel.:')]

XPath 返回文本节点 - 而不是元素 - 在删除前导和尾随空格*后，以关键字 Tel.: 开头。

*) 有关 normalize-space() 更准确地执行操作的引用:

The normalize-space function strips leading and trailing white-space from a string, replaces sequences of whitespace characters by a single space, and returns the resulting string. [Mozilla Developer Network]

关于html - xpath获取以特定字符或字符串开头的数据，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/36663142/

html - xpath获取以特定字符或字符串开头的数据

上一篇：google-apps-script - 如何将google脚本执行时间限制为1分钟？

下一篇： tensorflow 。 [batch_size, 1] 和 [batch_size] 之间的差异