python - 尝试访问现有 <a> 元素的 .text 属性时出现 NoneType 错误

我正在使用BeautifulSoup抓取页面上的第一个 wikitable List of military engagements during the Russian invasion of Ukraine获取所有 57 场战斗的名称。我附上了该表的 HTML 图像以供引用:HTML of the wikitable .

要获取所有 <a>第一列中的元素并仅获取文本(战斗名称)，我执行了以下操作:

import requests
from bs4 import BeautifulSoup

url = 'https://en.wikipedia.org/wiki/List_of_military_engagements_during_the_Russian_invasion_of_Ukraine'
page = requests.get(url)
soup = BeautifulSoup(page.text, 'lxml')
table = soup.find('table')
rows = table.find_all('tr')

battlenames = []
for row in rows:
    # Find the first <td> element within the row
    td_element = row.find('td')
    if td_element:
        # Find the first <a> element within the <td> element
        battlename = td_element.find('a')
        cleanname = battlename.text
        battlenames.append(cleanname)

for name in battlenames:
    print(name)

我在 Spyder 和 Jupyter Notebook 中运行此命令并收到以下错误:

AttributeError                            Traceback (most recent call last)
Cell In[6], line 18
     15     if td_element:
     16         # Find the first <a> element within the <td> element
     17         battlename = td_element.find('a')
---> 18         cleanname = battlename.text
     19         battlenames.append(cleanname)
     21 for name in battlenames:

AttributeError: 'NoneType' object has no attribute 'text'

这让我很惊讶，因为第一个 <td>每行 ( <tr> ) 的元素包含 <a>带有战斗名称的元素。即，表的第一列中没有会导致 NoneType 错误的空框。可能是什么问题？

最佳答案

编辑

根据 @Ouroboros1 的评论，更准确地说，问题在于 td 的某些元素不包含 a。

table contains one "sub" tr for "Battles of Voznesensk", where the first td fills "9 March 2022" in the "Start date" column. Now, this td just happens to have no link a

所以在调用.text之前你还必须检查是否有a:

if td_element:
    # Find the first <a> element within the <td> element
    battlename = td_element.find('a')
    # check hier if also a is available
    if battlename:
        cleanname = battlename.text
        battlenames.append(cleanname)

您也可以尝试改变您的选择策略，可以使用css selectors仅选择包含 a 的 tr 和 td:

soup.table.select('tr:has(td:first-of-type a)')

或者甚至直接将tr的第一个td中的所有a:

soup.table.select('tr td:first-of-type a')

CSS 选择器示例

import requests
from bs4 import BeautifulSoup

url = 'https://en.wikipedia.org/wiki/List_of_military_engagements_during_the_Russian_invasion_of_Ukraine'
page = requests.get(url)
soup = BeautifulSoup(page.text, 'lxml')

#Option A

for row in soup.table.select('tr:has(td:first-of-type a)'):
        print(row.td.a.text)

#Option B
for a in soup.table.select('tr td:first-of-type a'):
    print(a.text)

关于python - 尝试访问现有 <a> 元素的 .text 属性时出现 NoneType 错误，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/77364381/

python - 尝试访问现有 <a> 元素的 .text 属性时出现 NoneType 错误

编辑

CSS 选择器示例

上一篇：php - 即使产品在 WooCommerce 购物车中添加了两次，也将所有项目设置在独立的行中

下一篇：python - 如何在Python中输出十六进制值而不是字符串或整数？