java - Selenium,如何提取两个 div 标签之间的文本

标签 java selenium web automation

我刚开始使用 Selenium 在网站上执行 Web 自动化,并且在提取两个 div 标签之间的文本时遇到麻烦。

这是我试图从中提取文本的 HTML 代码片段。

 ...
<tr>
    <td width="150">
    <a href="https://rads.stackoverflow.com/amzn/click/com/B0099RGRT8" rel="nofollow noreferrer">
    <img height="90" border="0" width="90" alt="iOttie Easy Flex2 Windshield Dashboard Car Mount H&hellip by iOttie" src="http://ecx.images-amazon.com/images/I/51mf6Ry9J2L._SL500_SS90_.jpg">
    </a>
    <div class="xxsmall" style="margin-top: 5px">
        <a href="https://rads.stackoverflow.com/amzn/click/com/B0099RGRT8" rel="nofollow noreferrer">iOttie Easy Flex2 Windshield Dashboard Car Mount Holder Desk Stand for iPhone 5 4S 4 3GS Samsung Gal&amp;hellip</a>
        by iOttie
    </div>
    </td>
    <td style="padding-left: 10px;">
        <div>
            <div>
                <span style="margin-left:-5px; vertical-align: -1">

                </span>
                <b>
                <a href="http://www.amazon.com/gp/cdp/member-reviews/A2UQ07EFPSX78X/ref=cm_pdp_rev_title_1?ie=UTF8&sort_by=MostRecentReview#R12ATB4KTIWFV8">Bought for my wife, now I want one. Excellent Product.</a>
                </b>
                ,
                <span class="nowrap">November 30, 2012</span>
            </div>
            <div style="margin-top: 5px;">
                I bought this mount for my wife, the feedback from her was is that it was really nice and easy to use even while driving.
                <br>
                <br>
                So I "borrowed" it for a couple days, and now I am going to get one for myself. I am using it with an iPhone, but it would work fine with phones of all sizes, which is nice. If my phone size ever changes the mount will accommodate different sizes phones.
                <br>
                <br>
                The phone is very easy to insert and remove , even while driving.
                <br>
                The mount is easy to position but not loose enough that it doesn't hold the position you want.
                <br>
                <br>
                I was very impressed with the windshield mount, it is not just a typical suction cup mount. (Which always at some point…
                <a href="http://www.amazon.com/gp/cdp/member-reviews/A2UQ07EFPSX78X/ref=cm_pdp_rev_more?ie=UTF8&sort_by=MostRecentReview#R12ATB4KTIWFV8">Read more</a>
            </div>
        </div>
    </td>
</tr>
...

其他 div 标签实际上也包含其他文本。

我想从中提取的是: 我给我妻子买了这个支架,她的反馈是它真的很好,即使在开车时也很容易使用。

            I bought this mount for my wife, the feedback from her was is that it was really nice and easy to use even while driving.

            So I "borrowed" it for a couple days, and now I am going to get one for myself. I am using it with an iPhone, but it would work fine with phones of all sizes, which is nice. If my phone size ever changes the mount will accommodate different sizes phones.

            The phone is very easy to insert and remove , even while driving.

            The mount is easy to position but not loose enough that it doesn't hold the position you want.

            I was very impressed with the windshield mount, it is not just a typical suction cup mount. (Which always at some point…

这是我的代码:

String review;
try {
    review = WebElement.bucketElement.findElement(By.xpath("./td/div")).getText();
} catch (NoSuchElementException nsee) {
    review = "NA";
}

这实际上从所有最里面的 div 标签中提取了所有文本,这不是我想要的。我可以使用 ./td/div/div[3] 定位特定的 div 标签,但无法获取 div 标签之间的文本。

有什么想法吗?

谢谢

最佳答案

您可以使用正则表达式作为解决方法:

String review;
try {
    review = WebElement.bucketElement.findElement(By.xpath("./td/div")).getText();
    review.replaceAll("(<.+>)", "");
} catch (NoSuchElementException nsee) {
    review = "NA";
}

正则表达式删除所有标签和内部元素文本。只剩下第一级文本了。这意味着如果您有:

some strange<div>other text</div> text 结果字符串将为:some strange text

如果需要更复杂的正则表达式here is useful link to test it .

关于java - Selenium,如何提取两个 div 标签之间的文本,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/15675447/

相关文章:

java - 如何在AppDynamics Controller 中为每个应用程序配置单独的事务阈值?

java - Spring集成单元测试http :outbound-gateway

java - 如何在谷歌地图上设置可预测搜索?

java - 无法打开 .txt 文件以读取整数

python - 多处理和 Selenium Python

selenium - driver.getWindowhandles()中得到的Set是否保留顺序

html - IE9 中的 XHTML 字体浏览器兼容性

Python Selenium 从 Angular JS 下拉列表中选择选项

"could not fulfill request for *known* reason"的 HTTP 状态代码

php - 用于处理 URL 的服务器端脚本或函数