我正在尝试抓取以下 html:
<table>
<tr>
<td class="cellRight" style="cursor:pointer;">
<table cellpadding="0" cellspacing="0" width="100%">
<tr>
<td class="cellRight" style="border:0;color:#0066CC;"
title="View summary" width="70%">92%</td>
<td class="cellRight" style="border:0;" width="30%">
</td>
</tr>
</table>
</td>
</tr>
<tr class="listroweven">
<td class="cellLeft" nowrap><span class="categorytab" onclick=
"showAssignmentsByMPAndCourse('08/03/2015','58100:6');" title=
"Display Assignments for Art 5 with Ms. Martinho"><span style=
"text-decoration: underline">58100/6 - Art 5 with Ms.
Martinho</span></span></td>
<td class="cellLeft" nowrap>
Martinho, Suzette<br>
<b>Email:</b> <a href="mailto:smartinho@mtsd.us" style=
"text-decoration:none"><img alt="" border="0" src=
"/genesis/images/labelIcon.png" title=
"Send e-mail to teacher"></a>
</td>
<td class="cellRight" onclick=
"window.location.href = '/genesis/parents?tab1=studentdata&tab2=gradebook&tab3=coursesummary&studentid=100916&action=form&courseCode=58100&courseSection=6&mp=MP4';"
style="cursor:pointer;">
<table cellpadding="0" cellspacing="0" width="100%">
<tr>
<td class="cellCenter"><span style=
"font-style:italic;color:brown;font-size: 8pt;">No
Grades</span></td>
</tr>
</table>
</td>
</tr>
<tr class="listrowodd">
<td class="cellLeft" nowrap><span class="categorytab" onclick=
"showAssignmentsByMPAndCourse('08/03/2015','58200:10');" title=
"Display Assignments for Family and Consumer Sciences 5 with Sheerin">
<span style="text-decoration: underline">58200/10 - Family and
Consumer Sciences 5 with Sheerin</span></span></td>
<td class="cellLeft" nowrap>
Sheerin, Susan<br>
<b>Email:</b> <a href="mailto:ssheerin@mtsd.us" style=
"text-decoration:none"><img alt="" border="0" src=
"/genesis/images/labelIcon.png" title=
"Send e-mail to teacher"></a>
</td>
<td class="cellRight" style="cursor:pointer;">
<table cellpadding="0" cellspacing="0" width="100%">
<tr>
<td class="cellCenter"><span style=
"font-style:italic;color:brown;font-size: 8pt;">No
Grades</span></td>
</tr>
</table>
</td>
</tr>
</table>
我正在尝试提取学生成绩的值,如果没有成绩,则值“无成绩”将出现在 html 中(如果是这种情况)。但是,当我执行如下选择请求时:
doc.select("[class=cellRight]")
我得到一个输出,其中所有成绩值都列出了两次(因为它们嵌套在包含 [class=cellRight] 区分符的两个元素中,以及正常数量的“无成绩”列表。所以我的问题是,如何我可以只选择包含区分符 [class=cellRight] 的文档中最里面的子项吗?(我已经处理了空白值的问题)感谢所有帮助!!
最佳答案
这有很多可能性。
其中之一是:测试每个“cellRight”元素的所有父元素是否也携带该类。如果发现则丢弃:
List<Element> keepList = new ArrayList<>();
Elements els = doc.select(".cellRight");
for (Element el : els){
boolean keep = true;
for (Element parentEl : el.parents()){
if (parentEl.hasClass("cellRight")){
//parent has class as well -> discard!
keep = false;
break;
}
}
if (keep){
keepList.add(el);
}
}
//keepList now contains inner most elements with your class
请注意,这是在没有编译器的情况下编写的,并且是出于我的想法。可能存在拼写/语法错误。
其他说明。仅当存在这个单一类时,您对 "[class=cellRight]"
的使用才能正常工作。对于随机顺序的多个类(这完全是预料之中的),最好使用点语法 ".cellRight"
关于java - 选择元素 Jsoup 最里面的子元素,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/31886973/