python - 使用 BeautifulSoup 与基本表的选项 - 无类 ID,

标签 python beautifulsoup html-table

当您的表没有类或属性值时,是否有在 python 中使用 BeautifulSoup 4 的推荐方法?

我正在考虑仅使用 Get_Text() 转储文本,但如果我想挑选单个值或将表分成更离散的部分,我该如何处理?

<table cellpadding="0" cellspacing="0" id="programmeDescriptor" width="100%">
  <tr>
    <td>
      <table cellpadding="5" cellspacing="0" class="borders" width="100%">
        <tr>
          <th colspan="1">
            Awards
          </th>
        </tr>
        <tr>
        </tr>
        <tr>
          <td>
            Ordinary Bachelor Degree
          </td>
        </tr>
      </table>
      <table border="0" cellpadding="0" cellspacing="0" width="100%">
        <tr>
          <td>
            <table cellpadding="5" cellspacing="0" class="borders">
              <tr>
                <th width="160">
                  Programme Code:
                </th>
                <td width="150">
                  CodeValue
                </td>
              </tr>
            </table>
          </td>
          <td width="5">
          </td>
          <td>
            <table cellpadding="5" cellspacing="0" class="borders">
              <tr>
                <th width="160">
                  Mode of Delivery:
                </th>
                <td width="150">
                  Full Time
                </td>
              </tr>
            </table>
          </td>
          <td width="5">
          </td>
          <td>
            <table cellpadding="5" cellspacing="0" class="borders">
              <tr>
                <th width="160">
                  No. of Semesters:
                </th>
                <td width="150">
                  6
                </td>
              </tr>
            </table>
          </td>
        </tr>
        <tr>
          <td>
            <table cellpadding="5" cellspacing="0" class="borders">
              <tr>
                <th width="160">
                  NFQ Level:
                </th>
                <td width="150">
                  7
                </td>
              </tr>
            </table>
          </td>
        </tr>
        <tr>
          <td>
            <table cellpadding="5" cellspacing="0" class="borders">
              <tr>
                <th width="160">
                  Embedded Award:
                </th>
                <td width="150">
                  No
                </td>
              </tr>
            </table>
          </td>
        </tr>
      </table>
      <table cellpadding="5" cellspacing="0" class="borders" width="100%">
        <tr>
          <th width="160">
            Department:
          </th>
          <td>
            Computing
          </td>
        </tr>
      </table>
      <div class="pageBreak">
      </div>
      <h3>
    Programme Outcomes
   </h3>
      <p class="info">
        On successful completion of this programme the learner will be able to :
      </p>
      <table cellpadding="5" cellspacing="0" class="borders" width="100%">
        <tr>
          <th width="30">
            PO1
          </th>
          <td class="head" colspan="2">
            Knowledge - Breadth
          </td>
        </tr>
        <tr>
          <td class="head" width="30">
          </td>
          <td class="head" width="30">
            (a)
          </td>
          <td>
            • Some block of text
        </tr>
        <tr>
          <th width="30">
            PO2
          </th>
          <td class="head" colspan="2">
            Knowledge - Kind
          </td>
        </tr>
        <tr>
          <td class="head" width="30">
          </td>
          <td class="head" width="30">
            (a)
          </td>
          <td>
            • Some block of text
          </td>
        </tr>
        <tr>
          <th width="30">
            PO3
          </th>
          <td class="head" colspan="2">
            Skill - Range
          </td>
        </tr>
        <tr>
          <td class="head" width="30">
          </td>
          <td class="head" width="30">
            (a)
          </td>
          <td>
            • Some block of text
          </td>
        </tr>
        <tr>
          <th width="30">
            PO4
          </th>
          <td class="head" colspan="2">
            Skill - Selectivity
          </td>
        </tr>
        <tr>
          <td class="head" width="30">
          </td>
          <td class="head" width="30">
            (a)
          </td>
          <td>
            • Some block of text
          </td>
        </tr>
        <tr>
          <th width="30">
            PO5
          </th>
          <td class="head" colspan="2">
            Competence - Context
          </td>
        </tr>
        <tr>
          <td class="head" width="30">
          </td>
          <td class="head" width="30">
            (a)
          </td>
          <tdSome block of text </td>
        </tr>
        <tr>
          <th width="30">
            PO6
          </th>
          <td class="head" colspan="2">
            Competence - Role
          </td>
        </tr>
        <tr>
          <td class="head" width="30">
          </td>
          <td class="head" width="30">
            (a)
          </td>
          <td>
            • Some block of text
          </td>
        </tr>
        <tr>
          <th width="30">
            PO7
          </th>
          <td class="head" colspan="2">
            Competence - Learning to Learn
          </td>
        </tr>
        <tr>
          <td class="head" width="30">
          </td>
          <td class="head" width="30">
            (a)
          </td>
          <td>
            • Some block of text
          </td>
        </tr>
        <tr>
          <th width="30">
            PO8
          </th>
          <td class="head" colspan="2">
            Competence - Insight
          </td>
        </tr>
        <tr>
          <td class="head" width="30">
          </td>
          <td class="head" width="30">
            (a)
          </td>
          <td>
            • The graduate will demonstrate the ability to specify, design and build an IT system or research &amp; report on a current IT topic
          </td>
        </tr>
      </table>
      <div class="pageBreak">
      </div>
      <h3>
    Semester Schedules
   </h3>
      <table cellpadding="0" cellspacing="0" width="100%">
        <tr>
          <td colspan="2">
            <h4>
       Stage 1 / Semester 1
      </h4>
          </td>
        </tr>
        <tr>
          <td colspan="2">
            <table cellpadding="5" cellspacing="0" class="borders" width="100%">
              <tr>
                <td class="head" colspan="2">
                  Mandatory
                </td>
              </tr>
              <tr>
                <th width="50">
                  Module Code
                </th>
                <th>
                  Module Title
                </th>
              </tr>
              <tr>
                <td>
                  Code 
                </td>
                <td
                  <a href="index.cfm/page/module/moduleId/3897" target="_blank">
          Web &amp; User Experience
         </a>
                </td>
              </tr>
              <tr>
                <td>
                  Code
                </td>
                <td>
                  <a href="index.cfm/page/module/moduleId/3881" target="_blank">
          Software Development 1
         </a>
                </td>
              </tr>
              <tr>
                <td>
                  Code
                </td>
                <td>
                  <a href="index.cfm/page/module/moduleId/1645" target="_blank">
          Computer Architecture
         </a>
                </td>
              </tr>
              <tr>
                <td>
                  Code
                </td>
                <td>
                  <a href="index.cfm/page/module/moduleId/2328" target="_blank">
          Discrete Mathematics 1
         </a>
                </td>
              </tr>
              <tr>
                <td>
                  Code
                </td>
                <td>
                  <a href="index.cfm/page/module/moduleId/3848" target="_blank">
          Business &amp; Information Systems
         </a>
                </td>
              </tr>
              <tr>
                <td>
                  Code
                </td>
                <td>
                  <a href="index.cfm/page/module/moduleId/2054" target="_blank">
          Learning to Learn at Third Level
         </a>
                </td>
              </tr>
            </table>
          </td>
        </tr>
      </table>
      <table cellpadding="0" cellspacing="0" width="100%">
        <tr>
          <td colspan="2">
            <h4>
       Stage 1 / Semester 2
      </h4>
          </td>
        </tr>
        <tr>
          <td colspan="2">
            <table cellpadding="5" cellspacing="0" class="borders" width="100%">
              <tr>
                <td class="head" colspan="2">
                  Mandatory
                </td>
              </tr>
              <tr>
                <th width="50">
                  Module Code
                </th>
                <th>
                  Module Title
                </th>
              </tr>
              <tr>
                <td>
                  Code
                </td>
                <td>
                  <a href="index.cfm/page/module/moduleId/3886" target="_blank">
          Software Development 2
         </a>
                </td>
              </tr>
              <tr>
                <td>
                  Code
                </td>
                <td>
                  <a href="index.cfm/page/module/moduleId/3895" target="_blank">
          Object Oriented Systems Analysis
         </a>
                </td>
              </tr>
              <tr>
                <td>
                  Code
                </td>
                <td>
                  <a href="index.cfm/page/module/moduleId/3875" target="_blank">
          Database Fundamentals
         </a>
                </td>
              </tr>
              <tr>
                <td>
                  Code
                </td>
                <td>
                  <a href="index.cfm/page/module/moduleId/3874" target="_blank">
          Operating Systems Fundamentals
         </a>
                </td>
              </tr>
              <tr>
                <td>
                  Code
                </td>
                <td>
                  <a href="index.cfm/page/module/moduleId/2330" target="_blank">
          Statistics
         </a>
                </td>
              </tr>
              <tr>
                <td>
                  Code
                </td>
                <td>
                  <a href="index.cfm/page/module/moduleId/2527" target="_blank">
          Social Media Communications
         </a>
                </td>
              </tr>
            </table>
          </td>
        </tr>
      </table>
      <div class="pageBreak">
      </div>
      <table cellpadding="0" cellspacing="0" width="100%">
        <tr>
          <td colspan="2">
            <h4>
       Stage 2 / Semester 1
      </h4>
          </td>
        </tr>
        <tr>
          <td colspan="2">
            <table cellpadding="5" cellspacing="0" class="borders" width="100%">
              <tr>
                <td class="head" colspan="2">
                  Mandatory
                </td>
              </tr>
              <tr>
                <th width="50">
                  Module Code
                </th>
                <th>
                  Module Title
                </th>
              </tr>
              <tr>
                <td>
                  Code
                </td>
                <td>
                  <a href="index.cfm/page/module/moduleId/3877" target="_blank">
          Web &amp; Mobile Design &amp; Development
         </a>
                </td>
              </tr>
              <tr>
                <td>
                  Code
                </td>
                <td>
                  <a href="index.cfm/page/module/moduleId/3876" target="_blank">
          Database Design And Programming
         </a>
                </td>
              </tr>
              <tr>
                <td>
                  Code
                </td>
                <td>
                  <a href="index.cfm/page/module/moduleId/3869" target="_blank">
          Software Development 3
         </a>
                </td>
              </tr>
              <tr>
                <td>
                  Code
                </td>
                <td>
                  <a href="index.cfm/page/module/moduleId/3873" target="_blank">
          Software Quality Assurance and Testing
         </a>
                </td>
              </tr>
              <tr>
                <td>
                 Code
                </td>
                <td>
                  <a href="index.cfm/page/module/moduleId/3629" target="_blank">
          Networking 1
         </a>
                </td>
              </tr>
              <tr>
                <td>
                  Code
                </td>
                <td>
                  <a href="index.cfm/page/module/moduleId/2477" target="_blank">
          Discrete Mathematics 2
         </a>
                </td>
              </tr>
            </table>
          </td>
        </tr>
      </table>
      <table cellpadding="0" cellspacing="0" width="100%">
        <tr>
          <td colspan="2">
            <h4>
       Stage 2 / Semester 2
      </h4>
          </td>
        </tr>
        <tr>
          <td colspan="2">
            <table cellpadding="5" cellspacing="0" class="borders" width="100%">
              <tr>
                <td class="head" colspan="2">
                  Mandatory
                </td>
              </tr>
              <tr>
                <th width="50">
                  Module Code
                </th>
                <th>
                  Module Title
                </th>
              </tr>
              <tr>
                <td>
                  Code
                </td>
                <td>
                  <a href="index.cfm/page/module/moduleId/3862" target="_blank">
          Project
         </a>
                </td>
              </tr>
              <tr>
                <td>
                 Code
                </td>
                <td>
                  <a href="index.cfm/page/module/moduleId/3911" target="_blank">
          Object Oriented Analysis &amp; Design 1
         </a>
                </td>
              </tr>
              <tr>
                <td>
                  Code
                </td>
                <td>
                  <a href="index.cfm/page/module/moduleId/3877" target="_blank">
          Web &amp; Mobile Design &amp; Development
         </a>
                </td>
              </tr>
              <tr>
                <td>
                  Code
                </td>
                <td>
                  <a href="index.cfm/page/module/moduleId/3630" target="_blank">
          Networking 2
         </a>
                </td>
              </tr>
              <tr>
                <td>
                  Code
                </td>
                <td>
                  <a href="index.cfm/page/module/moduleId/3870" target="_blank">
          Software Development 4
         </a>
                </td>
              </tr>
              <tr>
                <td>
                  Code
                </td>
                <td>
                  <a href="index.cfm/page/module/moduleId/2476" target="_blank">
          Management Science
         </a>
                </td>
              </tr>
            </table>
          </td>
        </tr>
      </table>
      <div class="pageBreak">
      </div>
      <table cellpadding="0" cellspacing="0" width="100%">
        <tr>
          <td colspan="2">
            <h4>
       Stage 3 / Semester 1
      </h4>
          </td>
        </tr>
        <tr>
          <td colspan="2">
            <table cellpadding="5" cellspacing="0" class="borders" width="100%">
              <tr>
                <td class="head" colspan="2">
                  Mandatory
                </td>
              </tr>
              <tr>
                <th width="50">
                  Module Code
                </th>
                <th>
                  Module Title
                </th>
              </tr>
              <tr>
                <td>
                  Code
                </td>
                <td>
                  <a href="index.cfm/page/module/moduleId/3911" target="_blank">
          Object Oriented Analysis &amp; Design 1
         </a>
                </td>
              </tr>
              <tr>
                <td>
                 Code
                </td>
                <td>
                  <a href="index.cfm/page/module/moduleId/3899" target="_blank">
          Operating Systems
         </a>
                </td>
              </tr>
              <tr>
                <td>
                  Code
                </td>
                <td>
                  <a href="index.cfm/page/module/moduleId/1721" target="_blank">
          Cloud Services &amp; Distributed Computing
         </a>
                </td>
              </tr>
              <tr>
                <td>
                  Code
                </td>
                <td>
                  <a href="index.cfm/page/module/moduleId/2580" target="_blank">
          Innovation &amp; Entrepreneurship
         </a>
                </td>
              </tr>
              <tr>
                <td>
                  Code
                </td>
                <td>
                  <a href="index.cfm/page/module/moduleId/3878" target="_blank">
          Web Application Development
         </a>
                </td>
              </tr>
              <tr>
                <td>
                 Code
                </td>
                <td>
                  <a href="index.cfm/page/module/moduleId/1689" target="_blank">
          Algorithms and Data Structures 1
         </a>
                </td>
              </tr>
              <tr>
                <td>
                  Code
                </td>
                <td>
                  <a href="index.cfm/page/module/moduleId/2025" target="_blank">
          Logic and Problem Solving
         </a>
                </td>
              </tr>
              <tr>
                <td>
                  Code
                </td>
                <td>
                  <a href="index.cfm/page/module/moduleId/3896" target="_blank">
          Advanced Databases
         </a>
                </td>
              </tr>
            </table>
          </td>
        </tr>
      </table>
      <table cellpadding="0" cellspacing="0" width="100%">
        <tr>
          <td colspan="2">
            <h4>
       Stage 3 / Semester 2
      </h4>
          </td>
        </tr>
        <tr>
          <td colspan="2">
            <table cellpadding="5" cellspacing="0" class="borders" width="100%">
              <tr>
                <td class="head" colspan="2">
                  Mandatory
                </td>
              </tr>
              <tr>
                <th width="50">
                  Module Code
                </th>
                <th>
                  Module Title
                </th>
              </tr>
              <tr>
                <td>
                  Code
                </td>
                <td>
                  <a href="index.cfm/page/module/moduleId/2465" target="_blank">
          Project
         </a>
                </td>
              </tr>
              <tr>
                <td>
                  Code
                </td>
                <td>
                  <a href="index.cfm/page/module/moduleId/1728" target="_blank">
          Algorithms and Data Structures 2
         </a>
                </td>
              </tr>
              <tr>
                <td>
                  Code
                </td>
                <td>
                  <a href="index.cfm/page/module/moduleId/1675" target="_blank">
          Network Management
         </a>
                </td>
              </tr>
              <tr>
                <td>
                  Code
                </td>
                <td>
                  <a href="index.cfm/page/module/moduleId/2025" target="_blank">
          Logic and Problem Solving
         </a>
                </td>
              </tr>
              <tr>
                <td>
                  Code
                </td>
                <td>
                  <a href="index.cfm/page/module/moduleId/3899" target="_blank">
          Operating Systems
         </a>
                </td>
              </tr>
              <tr>
                <td>
                  Code
                </td>
                <td>
                  <a href="index.cfm/page/module/moduleId/2580" target="_blank">
          Innovation &amp; Entrepreneurship
         </a>
                </td>
              </tr>
              <tr>
                <td>
                  Code
                </td>
                <td>
                  <a href="index.cfm/page/module/moduleId/1679" target="_blank">
          Object Oriented Analysis &amp; Design 2
         </a>
                </td>
              </tr>
            </table>
          </td>
        </tr>
      </table>
      </td>
  </tr>
</table>

最佳答案

首先,表(所有表的父表)有一个 id 属性 - 让我们将其作为搜索的基础:

super_table = soup.find("table", id="programmeDescriptor")

然后,根据您在评论中提到的内容,看起来您可以通过标题来区分每个内部表。实现此逻辑的一种选择是找到 header ,然后使用 find_parent()查找父表:

def get_table_by_header_name(super_table, header):
    return super_table.find("th", text=header).find_parent("table")

用法:

desired_table = get_table_by_header_name(super_table, "Awards")

关于python - 使用 BeautifulSoup 与基本表的选项 - 无类 ID,,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/33919806/

相关文章:

python - 如何一一制作pip安装包?

带有 lambda fuctoin 的 python3 过滤器对象

python - 使用 Beautifulsoup Python 提取没有 HTML 标签的文本

python - 使用 BeautifulSoup Python 解析表

python - Beautiful Soup 是否适用于 Python 3.4.1?

jquery - 我需要使用 CSS 和 jQuery 使输入字段溢出 td 而不更改其位置或 tds 宽度

python - 如何在 matplotlib 中的刻度标签和轴之间添加空格

python - 我的 django 应用程序的 settings.py 文件中的这段代码有什么问题?

javascript - javascript 中有合并或拆分表格单元格的完整示例吗?

PHP回显html表头的mysql表列名