python - 如何在 Python 中解析带有行跨度的 HTML 表?

标签 python html python-3.x beautifulsoup html-table

问题

我正在尝试解析一个包含行跨度的 HTML 表格,例如,我正在尝试解析我的大学日程。

我遇到的问题是,如果最后一行包含行跨度,则下一行缺少 TD,而行跨度现在是缺少的 TD。

我不知道如何解释这一点,我希望能够解析这个时间表。

我尝试了什么

几乎所有我能想到的。

我得到的结果

[
    {
        'blok_eind': 4,
        'blok_start': 3,
        'dag': 4, # Should be 5
        'leraar': 'DOODF000',
        'lokaal': 'ALK C212',
        'vak': 'PROJ-T',
    },
]

如您所见,在上面的输出片段中有一个 vak 键,其值为 PROJ-Tdag 4 而它应该是 5(又名 Friday/Vrijdag),如下所示:

Table

我想要的结果

一个 Python dict(),看起来像上面发布的那个,但具有正确的值

地点:

  • day/dag 是一个从 1~5 的整数,代表 Monday~Friday
  • block_start/blok_start 是一个 int,表示类(class)开始的时间(时间 block ,表格左侧)
  • block_end/blok_eind 是一个 int,表示类(class)在哪个区 block 结束
  • classroom/lokaal 是类(class)所在的教室代码
  • teacher/leraar是老师的ID
  • course/vak是类(class)的ID

上述数据的基本 HTML 结构

<center>
    <table>
        <tr>
            <td>
                <table>
                    <tbody>
                        <tr>
                            <td>
                                <font>
                                    TEACHER-ID
                                </font>
                            </td>
                            <td>
                                <font>
                                    <b>
                                        CLASSROOM ID
                                    </b>
                                </font>
                            </td>
                        </tr>
                        <tr>
                            <td>
                                <font>
                                    COURSE ID
                                </font>
                            </td>
                        </tr>
                    </tbody>
                </table>
            </td>
        </tr>
    </table>
</center>

代码

HTML

<CENTER><font size="3" face="Arial" color="#000000">
<BR></font>
  <font size="6" face="Arial" color="#0000FF">
16AO4EIO1B
&nbsp;</font> <font size="4" face="Arial">
IO1B
</font>
  <BR>
  <TABLE border="3" rules="all" cellpadding="1" cellspacing="1">
    <TR>
      <TD align="center">
        <TABLE>
          <TR>
            <TD></TD>
          </TR>
        </TABLE>
      </TD>
      <TD colspan=12 align="center" nowrap="1">
        <TABLE>
          <TR>
            <TD align="center" nowrap=1><font size="2" face="Arial" color="#000000">
Maandag 29-08
</font> </TD>
          </TR>
        </TABLE>
      </TD>
      <TD colspan=12 align="center" nowrap="1">
        <TABLE>
          <TR>
            <TD align="center" nowrap=1><font size="2" face="Arial">
Dinsdag 30-08
</font> </TD>
          </TR>
        </TABLE>
      </TD>
      <TD colspan=12 align="center" nowrap="1">
        <TABLE>
          <TR>
            <TD align="center" nowrap=1><font size="2" face="Arial">
Woensdag 31-08
</font> </TD>
          </TR>
        </TABLE>
      </TD>
      <TD colspan=12 align="center" nowrap="1">
        <TABLE>
          <TR>
            <TD align="center" nowrap=1><font size="2" face="Arial">
Donderdag 01-09
</font> </TD>
          </TR>
        </TABLE>
      </TD>
      <TD colspan=12 align="center" nowrap="1">
        <TABLE>
          <TR>
            <TD align="center" nowrap=1><font size="2" face="Arial">
Vrijdag 02-09
</font> </TD>
          </TR>
        </TABLE>
      </TD>
    </TR>
    <TR>
      <TD rowspan=2 align="center" nowrap="1">
        <TABLE>
          <TR>
            <TD align="center" rowspan="2" nowrap=1><font size="3" face="Arial">
<B>1</B>
</font> </TD>
            <TD align="center" nowrap=1><font size="2" face="Arial">
8:30
</font> </TD>
          </TR>
          <TR>
            <TD align="center" nowrap=1><font size="2" face="Arial">
9:20
</font> </TD>
          </TR>
        </TABLE>
      </TD>
      <TD colspan=12 rowspan=2 align="center" nowrap="1">
        <TABLE>
          <TR>
            <TD></TD>
          </TR>
        </TABLE>
      </TD>
      <TD colspan=12 rowspan=2 align="center" nowrap="1">
        <TABLE>
          <TR>
            <TD></TD>
          </TR>
        </TABLE>
      </TD>
      <TD colspan=12 rowspan=2 align="center" nowrap="1">
        <TABLE>
          <TR>
            <TD></TD>
          </TR>
        </TABLE>
      </TD>
      <TD colspan=12 rowspan=2 align="center" nowrap="1">
        <TABLE>
          <TR>
            <TD></TD>
          </TR>
        </TABLE>
      </TD>
      <TD colspan=12 rowspan=4 align="center" nowrap="1">
        <TABLE>
          <TR>
            <TD width="50%" nowrap=1><font size="2" face="Arial">
BLEEJ002
</font> </TD>
            <TD width="50%" nowrap=1><font size="2" face="Arial">
<B>ALK B021</B>
</font> </TD>
          </TR>
          <TR>
            <TD colspan="2" width="50%" nowrap=1><font size="2" face="Arial">
WEBD
</font> </TD>
          </TR>
        </TABLE>
      </TD>
    </TR>
    <TR>
    </TR>
    <TR>
      <TD rowspan=2 align="center" nowrap="1">
        <TABLE>
          <TR>
            <TD align="center" rowspan="2" nowrap=1><font size="3" face="Arial">
<B>2</B>
</font> </TD>
            <TD align="center" nowrap=1><font size="2" face="Arial">
9:20
</font> </TD>
          </TR>
          <TR>
            <TD align="center" nowrap=1><font size="2" face="Arial">
10:10
</font> </TD>
          </TR>
        </TABLE>
      </TD>
      <TD colspan=12 rowspan=2 align="center" nowrap="1">
        <TABLE>
          <TR>
            <TD></TD>
          </TR>
        </TABLE>
      </TD>
      <TD colspan=12 rowspan=2 align="center" nowrap="1">
        <TABLE>
          <TR>
            <TD></TD>
          </TR>
        </TABLE>
      </TD>
      <TD colspan=12 rowspan=4 align="center" nowrap="1">
        <TABLE>
          <TR>
            <TD width="50%" nowrap=1><font size="2" face="Arial">
BLEEJ002
</font> </TD>
            <TD width="50%" nowrap=1><font size="2" face="Arial">
<B>ALK B021B</B>
</font> </TD>
          </TR>
          <TR>
            <TD colspan="2" width="50%" nowrap=1><font size="2" face="Arial">
WEBD
</font> </TD>
          </TR>
        </TABLE>
      </TD>
      <TD colspan=12 rowspan=2 align="center" nowrap="1">
        <TABLE>
          <TR>
            <TD></TD>
          </TR>
        </TABLE>
      </TD>
    </TR>
    <TR>
    </TR>
    <TR>
      <TD rowspan=2 align="center" nowrap="1">
        <TABLE>
          <TR>
            <TD align="center" rowspan="2" nowrap=1><font size="3" face="Arial">
<B>3</B>
</font> </TD>
            <TD align="center" nowrap=1><font size="2" face="Arial">
10:25
</font> </TD>
          </TR>
          <TR>
            <TD align="center" nowrap=1><font size="2" face="Arial">
11:15
</font> </TD>
          </TR>
        </TABLE>
      </TD>
      <TD colspan=12 rowspan=2 align="center" nowrap="1">
        <TABLE>
          <TR>
            <TD></TD>
          </TR>
        </TABLE>
      </TD>
      <TD colspan=12 rowspan=2 align="center" nowrap="1">
        <TABLE>
          <TR>
            <TD></TD>
          </TR>
        </TABLE>
      </TD>
      <TD colspan=12 rowspan=2 align="center" nowrap="1">
        <TABLE>
          <TR>
            <TD></TD>
          </TR>
        </TABLE>
      </TD>
      <TD colspan=12 rowspan=4 align="center" nowrap="1">
        <TABLE>
          <TR>
            <TD width="50%" nowrap=1><font size="2" face="Arial">
DOODF000
</font> </TD>
            <TD width="50%" nowrap=1><font size="2" face="Arial">
<B>ALK C212</B>
</font> </TD>
          </TR>
          <TR>
            <TD colspan="2" width="50%" nowrap=1><font size="2" face="Arial">
PROJ-T
</font> </TD>
          </TR>
        </TABLE>
      </TD>
    </TR>
    <TR>
    </TR>
    <TR>
      <TD rowspan=2 align="center" nowrap="1">
        <TABLE>
          <TR>
            <TD align="center" rowspan="2" nowrap=1><font size="3" face="Arial">
<B>4</B>
</font> </TD>
            <TD align="center" nowrap=1><font size="2" face="Arial">
11:15
</font> </TD>
          </TR>
          <TR>
            <TD align="center" nowrap=1><font size="2" face="Arial">
12:05
</font> </TD>
          </TR>
        </TABLE>
      </TD>
      <TD colspan=12 rowspan=2 align="center" nowrap="1">
        <TABLE>
          <TR>
            <TD></TD>
          </TR>
        </TABLE>
      </TD>
      <TD colspan=12 rowspan=2 align="center" nowrap="1">
        <TABLE>
          <TR>
            <TD></TD>
          </TR>
        </TABLE>
      </TD>
      <TD colspan=12 rowspan=4 align="center" nowrap="1">
        <TABLE>
          <TR>
            <TD width="50%" nowrap=1><font size="2" face="Arial">
BLEEJ002
</font> </TD>
            <TD width="50%" nowrap=1><font size="2" face="Arial">
<B>ALK B021B</B>
</font> </TD>
          </TR>
          <TR>
            <TD colspan="2" width="50%" nowrap=1><font size="2" face="Arial">
MENT
</font> </TD>
          </TR>
        </TABLE>
      </TD>
      <TD colspan=12 rowspan=2 align="center" nowrap="1">
        <TABLE>
          <TR>
            <TD></TD>
          </TR>
        </TABLE>
      </TD>
    </TR>
    <TR>
    </TR>
    <TR>
      <TD rowspan=2 align="center" nowrap="1">
        <TABLE>
          <TR>
            <TD align="center" rowspan="2" nowrap=1><font size="3" face="Arial">
<B>5</B>
</font> </TD>
            <TD align="center" nowrap=1><font size="2" face="Arial">
12:05
</font> </TD>
          </TR>
          <TR>
            <TD align="center" nowrap=1><font size="2" face="Arial">
12:55
</font> </TD>
          </TR>
        </TABLE>
      </TD>
      <TD colspan=12 rowspan=2 align="center" nowrap="1">
        <TABLE>
          <TR>
            <TD></TD>
          </TR>
        </TABLE>
      </TD>
      <TD colspan=12 rowspan=2 align="center" nowrap="1">
        <TABLE>
          <TR>
            <TD></TD>
          </TR>
        </TABLE>
      </TD>
      <TD colspan=12 rowspan=2 align="center" nowrap="1">
        <TABLE>
          <TR>
            <TD></TD>
          </TR>
        </TABLE>
      </TD>
      <TD colspan=12 rowspan=2 align="center" nowrap="1">
        <TABLE>
          <TR>
            <TD></TD>
          </TR>
        </TABLE>
      </TD>
    </TR>
    <TR>
    </TR>
    <TR>
      <TD rowspan=2 align="center" nowrap="1">
        <TABLE>
          <TR>
            <TD align="center" rowspan="2" nowrap=1><font size="3" face="Arial">
<B>6</B>
</font> </TD>
            <TD align="center" nowrap=1><font size="2" face="Arial">
12:55
</font> </TD>
          </TR>
          <TR>
            <TD align="center" nowrap=1><font size="2" face="Arial">
13:45
</font> </TD>
          </TR>
        </TABLE>
      </TD>
      <TD colspan=12 rowspan=2 align="center" nowrap="1">
        <TABLE>
          <TR>
            <TD></TD>
          </TR>
        </TABLE>
      </TD>
      <TD colspan=12 rowspan=2 align="center" nowrap="1">
        <TABLE>
          <TR>
            <TD></TD>
          </TR>
        </TABLE>
      </TD>
      <TD colspan=12 rowspan=2 align="center" nowrap="1">
        <TABLE>
          <TR>
            <TD></TD>
          </TR>
        </TABLE>
      </TD>
      <TD colspan=12 rowspan=2 align="center" nowrap="1">
        <TABLE>
          <TR>
            <TD></TD>
          </TR>
        </TABLE>
      </TD>
      <TD colspan=12 rowspan=4 align="center" nowrap="1">
        <TABLE>
          <TR>
            <TD width="50%" nowrap=1><font size="2" face="Arial">
JONGJ003
</font> </TD>
            <TD width="50%" nowrap=1><font size="2" face="Arial">
<B>ALK B008</B>
</font> </TD>
          </TR>
          <TR>
            <TD colspan="2" width="50%" nowrap=1><font size="2" face="Arial">
BURG
</font> </TD>
          </TR>
        </TABLE>
      </TD>
    </TR>
    <TR>
    </TR>
    <TR>
      <TD rowspan=2 align="center" nowrap="1">
        <TABLE>
          <TR>
            <TD align="center" rowspan="2" nowrap=1><font size="3" face="Arial">
<B>7</B>
</font> </TD>
            <TD align="center" nowrap=1><font size="2" face="Arial">
13:45
</font> </TD>
          </TR>
          <TR>
            <TD align="center" nowrap=1><font size="2" face="Arial">
14:35
</font> </TD>
          </TR>
        </TABLE>
      </TD>
      <TD colspan=12 rowspan=2 align="center" nowrap="1">
        <TABLE>
          <TR>
            <TD></TD>
          </TR>
        </TABLE>
      </TD>
      <TD colspan=12 rowspan=2 align="center" nowrap="1">
        <TABLE>
          <TR>
            <TD></TD>
          </TR>
        </TABLE>
      </TD>
      <TD colspan=12 rowspan=4 align="center" nowrap="1">
        <TABLE>
          <TR>
            <TD width="50%" nowrap=1><font size="2" face="Arial">
FLUIP000
</font> </TD>
            <TD width="50%" nowrap=1><font size="2" face="Arial">
<B>ALK B004</B>
</font> </TD>
          </TR>
          <TR>
            <TD colspan="2" width="50%" nowrap=1><font size="2" face="Arial">
ICT algemeen  Prakti
</font> </TD>
          </TR>
        </TABLE>
      </TD>
      <TD colspan=12 rowspan=2 align="center" nowrap="1">
        <TABLE>
          <TR>
            <TD></TD>
          </TR>
        </TABLE>
      </TD>
    </TR>
    <TR>
    </TR>
    <TR>
      <TD rowspan=2 align="center" nowrap="1">
        <TABLE>
          <TR>
            <TD align="center" rowspan="2" nowrap=1><font size="3" face="Arial">
<B>8</B>
</font> </TD>
            <TD align="center" nowrap=1><font size="2" face="Arial">
14:50
</font> </TD>
          </TR>
          <TR>
            <TD align="center" nowrap=1><font size="2" face="Arial">
15:40
</font> </TD>
          </TR>
        </TABLE>
      </TD>
      <TD colspan=12 rowspan=2 align="center" nowrap="1">
        <TABLE>
          <TR>
            <TD></TD>
          </TR>
        </TABLE>
      </TD>
      <TD colspan=12 rowspan=2 align="center" nowrap="1">
        <TABLE>
          <TR>
            <TD></TD>
          </TR>
        </TABLE>
      </TD>
      <TD colspan=12 rowspan=2 align="center" nowrap="1">
        <TABLE>
          <TR>
            <TD></TD>
          </TR>
        </TABLE>
      </TD>
      <TD colspan=12 rowspan=4 align="center" nowrap="1">
        <TABLE>
          <TR>
            <TD width="50%" nowrap=1><font size="2" face="Arial">
KOOLE000
</font> </TD>
            <TD width="50%" nowrap=1><font size="2" face="Arial">
<B>ALK B008</B>
</font> </TD>
          </TR>
          <TR>
            <TD colspan="2" width="50%" nowrap=1><font size="2" face="Arial">
NED
</font> </TD>
          </TR>
        </TABLE>
      </TD>
    </TR>
    <TR>
    </TR>
    <TR>
      <TD rowspan=2 align="center" nowrap="1">
        <TABLE>
          <TR>
            <TD align="center" rowspan="2" nowrap=1><font size="3" face="Arial">
<B>9</B>
</font> </TD>
            <TD align="center" nowrap=1><font size="2" face="Arial">
15:40
</font> </TD>
          </TR>
          <TR>
            <TD align="center" nowrap=1><font size="2" face="Arial">
16:30
</font> </TD>
          </TR>
        </TABLE>
      </TD>
      <TD colspan=12 rowspan=2 align="center" nowrap="1">
        <TABLE>
          <TR>
            <TD></TD>
          </TR>
        </TABLE>
      </TD>
      <TD colspan=12 rowspan=2 align="center" nowrap="1">
        <TABLE>
          <TR>
            <TD></TD>
          </TR>
        </TABLE>
      </TD>
      <TD colspan=12 rowspan=2 align="center" nowrap="1">
        <TABLE>
          <TR>
            <TD></TD>
          </TR>
        </TABLE>
      </TD>
      <TD colspan=12 rowspan=2 align="center" nowrap="1">
        <TABLE>
          <TR>
            <TD></TD>
          </TR>
        </TABLE>
      </TD>
    </TR>
    <TR>
    </TR>
    <TR>
      <TD rowspan=2 align="center" nowrap="1">
        <TABLE>
          <TR>
            <TD align="center" rowspan="2" nowrap=1><font size="3" face="Arial">
<B>10</B>
</font> </TD>
            <TD align="center" nowrap=1><font size="2" face="Arial">
16:30
</font> </TD>
          </TR>
          <TR>
            <TD align="center" nowrap=1><font size="2" face="Arial">
17:20
</font> </TD>
          </TR>
        </TABLE>
      </TD>
      <TD colspan=12 rowspan=2 align="center" nowrap="1">
        <TABLE>
          <TR>
            <TD></TD>
          </TR>
        </TABLE>
      </TD>
      <TD colspan=12 rowspan=2 align="center" nowrap="1">
        <TABLE>
          <TR>
            <TD></TD>
          </TR>
        </TABLE>
      </TD>
      <TD colspan=12 rowspan=2 align="center" nowrap="1">
        <TABLE>
          <TR>
            <TD></TD>
          </TR>
        </TABLE>
      </TD>
      <TD colspan=12 rowspan=2 align="center" nowrap="1">
        <TABLE>
          <TR>
            <TD></TD>
          </TR>
        </TABLE>
      </TD>
      <TD colspan=12 rowspan=2 align="center" nowrap="1">
        <TABLE>
          <TR>
            <TD></TD>
          </TR>
        </TABLE>
      </TD>
    </TR>
    <TR>
    </TR>
  </TABLE>
  <TABLE cellspacing="1" cellpadding="1">
    <TR>
      <TD valign=bottom> <font size="4" face="Arial" color="#0000FF"></TR></TABLE><font size="3" face="Arial">
Periode1   29-08-2016 (35) - 04-09-2016 (35)   G r u b e r  &amp;  P e t t e r s   S o f t w a r e
</font></CENTER>

Python

from pprint import pprint
from bs4 import BeautifulSoup
import requests

r = requests.get("http://rooster.horizoncollege.nl/rstr/ECO/AMR/400-ECO/Roosters/36"
                 "/c/c00025.htm")
daytable = {
    1: "Maandag",
    2: "Dinsdag",
    3: "Woensdag",
    4: "Donderdag",
    5: "Vrijdag"
}
timetable = {
    1: ("8:30", "9:20"),
    2: ("9:20", "10:10"),
    3: ("10:25", "11:15"),
    4: ("11:15", "12:05"),
    5: ("12:05", "12:55"),
    6: ("12:55", "13:45"),
    7: ("13:45", "14:35"),
    8: ("14:50", "15:40"),
    9: ("15:40", "16:30"),
    10: ("16:30", "17:20"),
}

page = BeautifulSoup(r.content, "lxml")

roster = []
big_rows = 2
last_row_big = False
# There are 10 blocks, each made up out of 2 TR's, run through them
for block_count in range(2, 22, 2):
    # There are 5 days, first column is not data we want
    for day in range(2, 7):
        dayroster = {
            "dag": 0,
            "blok_start": 0,
            "blok_eind": 0,
            "lokaal": "",
            "leraar": "",
            "vak": ""
        }
        # This selector provides the classroom
        table_bold = page.select(
            "html > body > center > table > tr:nth-of-type(" + str(block_count) + ") > td:nth-of-type(" + str(
                day) + ") > table > tr > td > font > b")

        # This selector provides the teacher's code and the course ID
        table = page.select(
            "html > body > center > table > tr:nth-of-type(" + str(block_count) + ") > td:nth-of-type(" + str(
                day) + ") > table > tr > td > font")

        # This gets the rowspan on the current row and column
        rowspan = page.select(
            "html > body > center > table > tr:nth-of-type(" + str(block_count) + ") > td:nth-of-type(" + str(
                day) + ")")

        try:
            if table or table_bold and rowspan[0].attrs.get("rowspan") == "4":
                last_row_big = True
                # Setting end of class
                dayroster["blok_eind"] = (block_count // 2) + 1
            else:
                last_row_big = False
                # Setting end of class
                dayroster["blok_eind"] = (block_count // 2)
        except IndexError:
            pass

        if table_bold:
            x = table_bold[0]
            # Classroom ID
            dayroster["lokaal"] = x.contents[0]

        if table:
            iter = 0
            for x in table:
                content = x.contents[0].lstrip("\r\n").rstrip("\r\n")
                # Cell has data
                if content != "":
                    # Set start of class
                    dayroster["blok_start"] = block_count // 2
                    # Set day of class
                    dayroster["dag"] = day - 1
                    if iter == 0:
                        # Teacher ID
                        dayroster["leraar"] = content
                    elif iter == 1:
                        # Course ID
                        dayroster["vak"] = content
                    iter += 1

        if table or table_bold:
            # Store the data
            roster.append(dayroster)

# Remove duplicates
seen = set()
new_l = []
for d in roster:
    t = tuple(d.items())
    if t not in seen:
        seen.add(t)
        new_l.append(d)
pprint(new_l)

最佳答案

您必须跟踪前几行的行跨度,每列一个。

您可以简单地通过将行跨度的整数值复制到字典中来做到这一点,随后的行递减行跨度值直到它下降到 1(或者我们可以存储整数值减 1 和拖放到 0 以便于编码)。然后,您可以根据之前的行跨度调整后续表计数。

您的表格使用大小为 2 的默认跨度使这有点复杂,以 2 为增量递增,但可以通过除以 2 轻松恢复为可管理的数字。

与其使用大量的 CSS 选择器,不如只选择表格行,然后我们将对其进行迭代:

roster = []
rowspans = {}  # track rowspanning cells
# every second row in the table
rows = page.select('html > body > center > table > tr')[1:21:2]
for block, row in enumerate(rows, 1):
    # take direct child td cells, but skip the first cell:
    daycells = row.select('> td')[1:]
    rowspan_offset = 0
    for daynum, daycell in enumerate(daycells, 1):
        # rowspan handling; if there is a rowspan here, adjust to find correct position
        daynum += rowspan_offset
        while rowspans.get(daynum, 0):
            rowspan_offset += 1
            rowspans[daynum] -= 1
            daynum += 1
        # now we have a correct day number for this cell, adjusted for
        # rowspanning cells.
        # update the rowspan accounting for this cell
        rowspan = (int(daycell.get('rowspan', 2)) // 2) - 1
        if rowspan:
            rowspans[daynum] = rowspan

        texts = daycell.select("table > tr > td > font")
        if texts:
            # class info found
            teacher, classroom, course = (c.get_text(strip=True) for c in texts)
            roster.append({
                'blok_start': block,
                'blok_eind': block + rowspan,
                'dag': daynum,
                'leraar': teacher,
                'lokaal': classroom,
                'vak': course
            })

    # days that were skipped at the end due to a rowspan
    while daynum < 5:
        daynum += 1
        if rowspans.get(daynum, 0):
            rowspans[daynum] -= 1

这会产生正确的输出:

[{'blok_eind': 2,
  'blok_start': 1,
  'dag': 5,
  'leraar': u'BLEEJ002',
  'lokaal': u'ALK B021',
  'vak': u'WEBD'},
 {'blok_eind': 3,
  'blok_start': 2,
  'dag': 3,
  'leraar': u'BLEEJ002',
  'lokaal': u'ALK B021B',
  'vak': u'WEBD'},
 {'blok_eind': 4,
  'blok_start': 3,
  'dag': 5,
  'leraar': u'DOODF000',
  'lokaal': u'ALK C212',
  'vak': u'PROJ-T'},
 {'blok_eind': 5,
  'blok_start': 4,
  'dag': 3,
  'leraar': u'BLEEJ002',
  'lokaal': u'ALK B021B',
  'vak': u'MENT'},
 {'blok_eind': 7,
  'blok_start': 6,
  'dag': 5,
  'leraar': u'JONGJ003',
  'lokaal': u'ALK B008',
  'vak': u'BURG'},
 {'blok_eind': 8,
  'blok_start': 7,
  'dag': 3,
  'leraar': u'FLUIP000',
  'lokaal': u'ALK B004',
  'vak': u'ICT algemeen  Prakti'},
 {'blok_eind': 9,
  'blok_start': 8,
  'dag': 5,
  'leraar': u'KOOLE000',
  'lokaal': u'ALK B008',
  'vak': u'NED'}]

此外,即使类(class)跨越超过 2 个区 block ,或者只有一个区 block ,此代码也将继续工作;支持任何行跨大小。

关于python - 如何在 Python 中解析带有行跨度的 HTML 表?,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/39278376/

相关文章:

python - 从 VBA 调用 Python 脚本 - 不起作用

python - 将数据框中的值替换为另一个数据框中的值 - 正则表达式

python - pandas - O() 对数据帧进行分组和求和的大 O 复杂性

html表格单元格内容左右相同位置

html - 页脚的位置是不变的

python - 如何在 python 中定义一个包含 1000 位数字的十进制类?

python - Python 中的列表排序(转置)

python - from bs4 import * 在Python3下导入类NavigableString失败

html - 为什么 "Lime"和 "LimeGreen"在 HTML 中是不同的颜色?

python-3.x - 如何解决 XStartTimeoutError : Failed to start X on display ":1013" error