java - 使用 JSoup 从 HTML 中提取数据

我正在编写一个脚本来从 HTML 文档中提取数据。这是文档的一部分。

<div class="info">
<div id="info_box" class="inf_clear">
    <div id="restaurant_info_box_left">
        <table id="rest_logo">
            <tr>
                <td>
                    <a itemprop="url" title="XYZ" href="XYZ.com">
                        <img src="/files/logo/26721.jpg" alt="XYZ" title="XYZ" width="100" />
                    </a>
                </td>
            </tr>
        </table>
        <h1 id="Name"><a class="fn org url" rel="Order Online" href="XYZ.com" title="XYZ" itemprop="name">XYZ</a></h1>

        <div class="rest_data" itemprop="address" itemscope itemtype="http://schema.org/PostalAddress">

            <span itemprop="telephone">(305) 535-1379</span> | <b>
            <span itemprop="streetAddress">1755 Alton Rd</span>,
            <span itemprop="addressLocality">Miami Beach</span>,
            <span itemprop="addressRegion">FL</span>
            <span itemprop="postalCode">33139</span></b>
        </div>
        <div class="geo">
            <span class="latitude" title="25.792588"></span>
            <span class="longitude" title="-80.141214"></span>
        </div>
        <div class="rest_data">Estimated delivery time: <b>45-60 min</b></div>
    </div>

</div>

我正在使用 Jsoup，但不太确定如何实现这一点。

文档中有很多div标签，我尝试匹配它们的独特属性。假设 div 标签的 class 属性值为“info”

   Elements divs = doc.select("div");

        for (Element div : divs) {
            String divClass = div.attr("class").toString();
            if (divClass.equalsIgnoreCase("rest_info")) {
}

如果匹配，我必须在 div 标签内获取带有 id“rest_logo”的 table。

当使用 doc.select("table") 时，看起来解析器会搜索整个文档。

我需要实现的是，如果匹配到div标签属性，我需要获取匹配到的elements和attributes div 标签。

Expected Output: 

Name : XYZ

telephone:(305) 535-1379

streetAddress:1755 Alton Rd

addressLocality:Miami Beach

addressRegion:FL

postalCode:33139

latitude:25.792588

longitude:-80.141214

Estimated delivery time:45-60 min

有什么想法吗？

最佳答案

    for (Element e : doc.select("div.info")) {
        System.out.println("Name: " + e.select("a.fn").text());
        System.out.println("telephone: " + e.select("span[itemprop=telephone]").text());
        System.out.println("streetAddress: " + e.select("span[itemprop=streetAddress]").text());
        // .....
    }

关于java - 使用 JSoup 从 HTML 中提取数据，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/34836437/

java - 使用 JSoup 从 HTML 中提取数据

上一篇：css - 无法更改 Bootstrap "well"组件中的字体系列

下一篇：html - 如何创建用于在页面/幻灯片之间切换的元素符号