java - 使用列表 Android Studio 从 HTMLData 获取 URL

标签 java android arrays regex list

我已经接近我想要的了,但我被挡住了...... 我有HTML我的字符串 contentString 中的数据: Log.i(TAG, "ALL URL : " + contentString); :

<p><b>14th April</b></p>
<p>The wind is south west with 4 to 5 foot of swell at the peak. Streedagh will probably be the best beach break.</p>
<p><span id="more-113"></span></p>
<p>High tide: 1250  3.1m    <span style="color: #ff0000;"> <a href="http://www.bundoransurfco.com/webcam/"><strong>CLICK HERE FOR LIVE PEAK WEBCAM</strong></a></span></p>
<p>Low Tide: 1854 1.4m</p>
<p></p>
<p></p>
<style type='text/css'>
#gallery-1 {
margin: auto;
}
#gallery-1 .gallery-item {
float: left;
margin-top: 10px;
text-align: center;
width: 50%;
}
#gallery-1 img {
border: 2px solid #cfcfcf;
}
#gallery-1 .gallery-caption {
margin-left: 0;
}
/* see gallery_shortcode() in wp-includes/media.php */
</style>
<div id='gallery-1' class='gallery galleryid-113 gallery-columns-2 gallery-size-thumbnail'><dl class='gallery-item'>
<dt class='gallery-icon portrait'>
<a rel="prettyPhoto[gallery-113]" href='http://www.bundoransurfco.com/wp-content/uploads/2014/11/11149460_10152656389992000_7842452340110509403_n.jpg'><img width="67" height="68" src="http://www.bundoransurfco.com/wp-content/uploads/2014/11/11149460_10152656389992000_7842452340110509403_n-67x68.jpg" class="attachment-thumbnail colorbox-113 " alt="11149460_10152656389992000_7842452340110509403_n" /></a>
</dt></dl><dl class='gallery-item'>
<dt class='gallery-icon portrait'>
<a rel="prettyPhoto[gallery-113]" href='http://www.bundoransurfco.com/wp-content/uploads/2014/11/14th-April.jpg'><img width="67" height="68" src="http://www.bundoransurfco.com/wp-content/uploads/2014/11/14th-April-67x68.jpg" class="attachment-thumbnail colorbox-113 " alt="14th April" /></a>
</dt></dl><br style="clear: both" />
</div>
<p></p>
<p><b>3 day forecast to April 13th</b></p>
<p>Solid swell and onshore winds for the weekend. Best spots will be Rossnowlagh and Streedagh. Bundoran beaches and reefs will be blown out.</p>
<h1> Wind Charts</h1>
<p><a href="http://www.windguru.cz/int/index.php?sc=103244"><img class="size-thumbnail wp-image-747 alignleft" title="wind guru" src="http://www.bundoransurfco.com/wp-content/uploads/2010/12/wind-guru-67x68.jpg" alt="" width="67" height="68" /></a> <a href="http://www.xcweather.co.uk/"><img class="alignnone size-thumbnail wp-image-749" title="xcweathersmall" src="http://www.bundoransurfco.com/wp-content/uploads/2010/12/xcweathersmall2-67x68.jpg" alt="" width="67" height="68" /></a>       <a href="http://www.buoyweather.com/wxnav6.jsp?region=UK&program=nww3BW1&grb=nww3&latitude=55.0&longitude=-8.75&zone=0&units=e"><img class="alignnone size-thumbnail wp-image-750" title="buoy weather" src="http://www.bundoransurfco.com/wp-content/uploads/2010/12/buoy-weather-67x68.jpg" alt="" width="67" height="68" /></a> <a href="http://www.windguru.cz/int/index.php?sc=103244">Wind Guru</a>       <a href="http://www.xcweather.co.uk/">XC Weather</a>       <a href="http://www.buoyweather.com/wxnav6.jsp?region=UK&program=nww3BW1&grb=nww3&latitude=55.0&longitude=-8.75&zone=0&units=e">Buoy Weather</a></p>

我只想获取带有 <a rel="prettyPhoto[gallery-113]" ...> 的 href 网址(在我的例子中是两个)

为此,我使用模式:

Pattern pattern = Pattern.compile("<a rel=\"prettyPhoto\\[gallery-113\\]\"[^>]*>");
        Matcher matcher = pattern.matcher(contentString);
        List<String> urlWithRel = new ArrayList<String>();
        String lastString;
        List<String> imagesUrl = null;
        while (matcher.find()) {
            urlWithRel.add(matcher.group());
            lastString = urlWithRel.toString();
        }
        Log.i(TAG, "url with rel : " + urlWithRel);
        Log.i(TAG, "final url : " + imagesUrl);
        Log.i(TAG, "List size : " + imagesUrl.size());

通过第一个正则表达式,我可以获得我需要的两个标记:

<a rel="prettyPhoto[gallery-113]" href='http://www.bundoransurfco.com/wp-content/uploads/2014/11/11149460_10152656389992000_7842452340110509403_n.jpg'>, <a rel="prettyPhoto[gallery-113]" href='http://www.bundoransurfco.com/wp-content/uploads/2014/11/14th-April.jpg'>

现在我只想存储 href 的 URL,我找到了一个仅用于获取 url 的正则表达式:(?<=href=).*(?=>)

但问题是我不能在列表上使用另一个正则表达式...如果我创建一个字符串来制作正则表达式,则正则表达式仅适用于第一个对象...

这是我的最终代码(不起作用):

Pattern pattern = Pattern.compile("<a rel=\"prettyPhoto\\[gallery-113\\]\"[^>]*>");
Matcher matcher = pattern.matcher(contentString);
List<String> urlWithRel = new ArrayList<String>();
String lastString;
List<String> imagesUrl = null;
while (matcher.find()) {
    urlWithRel.add(matcher.group());
    lastString = urlWithRel.toString();
    Pattern lastPattern = Pattern.compile("(?<=href=).*(?=>)");
    Matcher lastMatcher = lastPattern.matcher(lastString);
    imagesUrl = new ArrayList<String>();
    while (lastMatcher.find()) {
        imagesUrl.add(lastMatcher.group());
    }
}
Log.i(TAG, "url with rel : " + urlWithRel);
Log.i(TAG, "final url : " + imagesUrl);
Log.i(TAG, "List size : " + imagesUrl.size());

返回:

final url : ['http://www.bundoransurfco.com/wp-content/uploads/2014/11/11149460_10152656389992000_7842452340110509403_n.jpg'>, <a rel="prettyPhoto[gallery-113]" href='http://www.bundoransurfco.com/wp-content/uploads/2014/11/14th-April.jpg']

最佳答案

如果您愿意使用 jsoup 库,那么您应该使用以下代码段:

ArrayList<Url> urls=new ArrayList<Url>();
Document doc=Jsoup.parse(contentString);
Elements els=doc.select("a[href]");
for(Element el : els)
    if(el.attr("rel").equals("prettyPhoto[gallery-113]"))
       urls.add(new Url(el.attr("href")));

并记住为 Url 对象处理 MalformedURLException

关于java - 使用列表 Android Studio 从 HTMLData 获取 URL,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/29631868/

相关文章:

java - 如何使用搜索功能使过滤后的项目显示在 ListView 中(android)

java - 问题更新到 Glassfish 4.1

java - 如何从我爬行后得到的 "segments"获取单独的html文件?

java - 处理海量数据时,在数据存储中定义实体的正确方法是什么?

Android YouTubePlayer 制作循环

android - 您可以在 Android 应用程序中调用 python 例程吗?

java - 如何从相同的值生成唯一 ID

c - 初始化二维动态结构的成员

python - 在python中改组数组

php - 无法正确回显数组中的数据。 PHP