我有以下页面:
<div>
<h3>...</h3>
<span>...</span>
<p>...</p>
<span...</span>
<span...</span>
<span...</span>
<p>...</p>
<span...</span>
<span...</span>
<hr />
<h3>...</h3>
<span>...</span>
<p>...</p>
<p>...</p>
<hr />
<h3>...</h3>
<span>...</span>
<span>...</span>
<p>...</p>
<p>...</p>
<hr />
</div>
正如您所看到的,大多数选择器都在同一级别内。我试图弄清楚如何使用 Jsoup 一次废弃一个 block 。
block 表示所有以 <h3>
开头的选择器并以 <hr>
结尾
(在上面的例子中有 3 个 block )。
之间的选择器不一致,并且数量可以变化。
我看了官方API documentation ,但无法找到正确的方法。
最佳答案
package stack;
import java.io.File;
import java.util.ArrayList;
import java.util.List;
import org.jsoup.Jsoup;
import org.jsoup.nodes.Document;
import org.jsoup.nodes.Element;
import org.jsoup.select.Elements;
public class Stack {
public static void main(String args[]) throws Exception {
File input = new File("test.html");
Document doc = Jsoup.parse(input, "UTF-8");
List<Elements> blocks = new ArrayList<>();
Elements listofh3 = doc.getElementsByTag("h3");
for(Element h3 : listofh3) {
Elements block = new Elements();
block.add(h3);
Element cursor = h3;
while(!cursor.tagName().equals("hr")) {
cursor = cursor.nextElementSibling();
block.add(cursor);
}
blocks.add(block);
}
for(Elements block : blocks) {
System.out.println(block);
System.out.println("----------------------------");
}
}
}
另一个解决方案可能是这样的
package stack;
import java.io.File;
import org.jsoup.Jsoup;
import org.jsoup.nodes.Document;
import org.jsoup.nodes.Element;
import org.jsoup.select.Elements;
public class Stack {
public static void main(String args[]) throws Exception {
File input = new File("test.html");
Document doc = Jsoup.parse(input, "UTF-8");
Elements listofh3 = doc.getElementsByTag("h3");
for(Element h3 : listofh3) {
Element span = doc.createElement("span");
span.addClass("block");
Element cursor = h3;
while(!cursor.tagName().equals("hr")) {
Element next = cursor.nextElementSibling();
span.appendChild(cursor);
cursor = next;
}
cursor.remove(); //remove hr
doc.body().appendChild(span);
}
System.out.println(doc);
}
}
测试输入
<div>
<h3>header 1</h3>
<span>span 1</span>
<p>p 1</p>
<span>span 11</span>
<span>span 111</span>
<span>span 1111</span>
<p>p 11</p>
<span>span 11111</span>
<span>span 111111</span>
<hr />
<h3>header 2</h3>
<span>span 2</span>
<p>p 2</p>
<p>p 22</p>
<hr />
<h3>header 3</h3>
<span>span 3</span>
<span>span 33</span>
<p>p 3</p>
<p>p 33</p>
<hr />
</div>
关于java - Jsoup提取一组选择器直到到达指定的选择器,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/31902540/