java - 如何使用 jsoup 从网页获取子类

标签 java html css jsoup

我正在使用 jsoup 从这样的网页中查找类

        Document doc = null;
        try {
            doc = Jsoup.connect(strings[0]).get();
            // Get document (HTML page) title
            String title = doc.title();
            // Get meta info
            Elements metaElems = doc.select("div");
            for (Element metaElem : metaElems) {

                if (metaElem.hasClass("job-title")){
                    System.out.println("found a job  " + "\r\n" + metaElem.toString() ); //this works finds all job titles and links
                }
                if (metaElem.hasClass("detail-body")){
                    System.out.println("detail-body " + "\r\n" + metaElem.toString() ); //this works finds all job titles and links
                }

            }
        } catch (IOException e) {
            e.printStackTrace();
        }

现在我想在详细信息正文中获取子类(请参阅下面的代码片段),如下所示

<li class="location"> 

尝试从我的循环中获取它,例如

if (metaElem.hasClass("location")){
          //do stuff
                     ); 

不起作用,我认为这是因为它是细节主体类的子级,但我对此相当陌生,所以下面可能是错误的,这是从上面代码打印的内容的片段(一次迭代),有人告诉我如何做要在不使用子字符串的情况下获取内部信息(并不是我反对它,我只是宁愿将所有内容组织得井井有条,如果这是唯一的方法,我可以使用子字符串)

found a job  
<div class="job-title"> 
<a href="/job/class-2-driver/driveforce-job77343151" data-dynamic-qs="?
entryurl=%2fjobs%2fin-%3fradius%3d5%2377343151" title="See details for a 
Class 2 
Driver in West Midlands (matches on class 2 driver)"> <h2>Class 2 
Driver</h2> 
</a> 
</div>
detail-body 
 <div class="detail-body"> 
  <div class="row"> 
   <div id="headerListContainer" class="col-xs-12 col-sm-8"> 
 <div class="applied-col pull-right" style="display: none;"> 
  <span class="applied-icon">Applied</span> 
 </div> 
 <ul class="header-list">
  <li class="location"> <span> <span> <a href="/jobs/in-west-midlands">West 
     Midlands</a>, <span>WR1 1UK</span> </span> </span> </li> 
  <li class="salary" title="salary">Salary ranges from &pound;9-&pound;10 
   pounds per hour</li> 
 </ul> 
 </div> 
 <div id="recruiterImageContainer" class="col-xs-5 col-sm-4 pull-right"> 
 <div class="recruiter-image"> 
  <a href="/jobs-at/driveforce/jobs" title="DriveForce"> <img data-
    original="/companylogos/0b01e3bd83ec4919a44b7b145725e15a.png" 
    class="lazy" /> </a> 
 </div> 
 </div> 
 <div class="col-xs-7 col-sm-8"> 
 <ul class="detail-list"> 
  <li class="job-type"> <span title="employment type">Contract</span> </li> 
  <li class="company" title="hiring organization"> <h3> <a href="/jobs-
     at/driveforce/jobs" title="DriveForce">DriveForce</a> </h3> </li> 
  <li class="date-posted" title="posted date"> <span> Today </span> </li> 
 </ul> 
 </div> 
 </div> 
 <div class="row detail-footer"> 
 <div class="col-sm-12 col-md-10"> 
 <div title="job details">    
      <p class="job-intro">DriveForce are currently recruiting for Class 2 
      drivers for a post based in Kidderminster! Driver duties will involve 
      going to 
      various locations disposing of confidential documents. To be 
      considered drivers 
      must have held their Class 2 license for 2 years or longer, and have 
      no more 
 than 6 points on their licens...</p> 
 </div> 
 </div> 
 <div class="email-job-col visible-xs visible-sm col-xs-4 col-md-2 col-sm-
  4"> 
 <button type="button" class="btn btn-default btn-sendjob" data-job-
  id="77343151" data-job-
  token="tqhJrOYD5cVRoPcna3gQfN/0cu8XVm1rV/LjjT2lvIz+o7dcujmniqJQMk8Kix2L" 
  data-
 toggle="modal" data-target="#sendJobModal">Send</button> 
 </div> 
 <div class="see-job-col visible-xs visible-sm col-xs-4 col-sm-4 col-md-4 "> 
 <a class="btn btn-default btn-seejob" href="/job/class-2-driver/driveforce-
   job77343151" data-dynamic-qs="?entryurl=%2fjobs%2fin-
   %3fradius%3d5%2377343151">See</a> 
 </div> 
 <div class="save-job-col col-xs-4 col-sm-4 col-md-2"> 
 <button id="77343151" class="saved-jobs-icon btn btn-default btn-savejob 
   btn-mobile-hover-fix disabled" disabled="disabled">Save</button> 
 </div> 
 </div> 
 <div class="row hidden-xs"> 
 <div class="col-xs-12"> 
 <div class="discipline-related-links"> 
  <ul> 
   <li class="col-xs-12 col-md-6"><a href="/jobs/logistics/in-west-
    midlands">See more Logistics jobs in West Midlands</a></li> 
   <li class="col-xs-12 col-md-6"><a href="/jobs/logistics">See all 
     Logistics jobs</a></li> 
  </ul> 
      </div> 
     </div> 
    </div> 
  </div>

最佳答案

JSoup 的 selector syntax允许您选择具有给定类的特定类型的所有元素。

在你的问题中你说

i want to get child classes in the detail-body (see snippet below) like this <li class="location">

以下选择器将查找 li 类型的所有元素上课location无论这些元素是否嵌套。

Document doc = Jsoup.parse(html);

Elements elements = doc.select("li.location");
for (int i = 0; i < elements.size(); i++) {
    System.out.println(elements.get(i).text());
}

在附加到 OP 的 HTML 中,div 带有 class=location嵌套在 ul 内与 class=header-list ,您可以使用 JSoup 的父子感知来仅选择 li 类型的元素。上课location嵌套在 <ul class="header-list"> 内,例如:

Elements elements = doc.select("ul.header-list > li.location");

关于java - 如何使用 jsoup 从网页获取子类,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/46958554/

相关文章:

java - 为什么在从默认接口(interface)方法调用静态接口(interface)方法时不能使用它?

java - 将应用程序移动到其他框架,然后返回到上一个

java - 将自定义对象列表转换为 CharSequence

javascript - HTML/CSS/jQuery : background-image opacity

javascript - Cufon 与悬停混淆

css - 最大宽度和和(方向 :landscape) media query does not work in real phones

java - 每 5 分钟间隔使用 grep 解析日志文件

javascript - CSS:根据原始大小调整 Div 大小

html - 为 Chrome 设计一个具有精确宽度的表格

css:将 div 元素放在另一个 div(底部)旁边的正确方法(如果它们不在一起)