python - 在Python中使用BS4确定HTML是否包含文本

标签 python web-scraping beautifulsoup

我正在尝试抓取维基百科美国的 COVID-19 数据图表 ( https://en.wikipedia.org/wiki/Template:COVID-19_pandemic_data/United_States_medical_cases ),但在确定 HTML 元素是否包含文本时遇到了麻烦。我尝试过使用

element.text is not None

作为 if 条件,但这只是允许 HTML 元素不输出任何内容。

element.text != ''

有相同的结果。还有什么我可以检查的吗? 这是我的全部代码

def getCases(page):
    cases = []
    firstCaseChild = page.find(title='January 21, 2020')
    firstCaseChild2 = firstCaseChild.find_parent('th')
    row = 0
    column = 0
    firstRow = []
    for case in firstCaseChild2.find_next_siblings('td'):
        if column == 55:
            break
        if case.text is not None:
            firstRow.append(case.text)
            column = column+1
            print(case.text)
        else:
            firstRow.append('0')
            column = column+1
            print('0')

最佳答案

另一种解决方案,不使用pandas:

import requests
from bs4 import BeautifulSoup


url = 'https://en.wikipedia.org/wiki/Template:COVID-19_pandemic_data/United_States_medical_cases'
soup = BeautifulSoup(requests.get(url).content, 'html.parser')

for tr in soup.tbody.select('tr:has(td)'):
    tds = [td.get_text(strip=True) for td in tr.select('td')]
    tds = [int(td) if td else 0 for td in tds]  # replace empty text '' with 0
    print(('{:>5}'*len(tds)).format(*tds))

打印:

    0    0    0    0    0    0    0    0    0    0    0    1    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0
    0    0    0    0    0    0    0    0    0    0    0    0    0    0    1    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0
    0    0    1    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0
    0    1    1    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0
    0    0    0    0    0    0    0    0    0    0    0    0    0    0    1    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0
    0    0    1    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0
    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    1    0    0    0    0    0    0    0    0    0    0    0    0
    0    0    3    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0
    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    1    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0
    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0
    0    0    2    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0
    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0
    0    0    1    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0
    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0
    0    0    1    0    0    0    0    0    0    1    0    2    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0
    0    0    1    0    0    0    0    0    0    0    0    3    0    0    1    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0
    0    0    5    0    0    0    0    0    0    1    0    7    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    2    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    1    0    2    0    0    0    0    0
    0    0    5    0    0    0    0    0    0    1    0    5    0    0    1    0    0    0    0    0    0    0    0    0    0    0    0    0    0    2    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    1    0    0    0    0    0    0    0    0    0
    0    1   12    0    0    0    0    0    0    0    0   10    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    1    0    0    0    0    1    0    0    0    0    0    0    0    0    1    0    0    1    0    1    0    0    0    0    0    0    0
    0    0    4    0    0    0    0    0    0    0    0   11    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    1    0    0    0    0    0    0    0    0    0    1    9    0    0    0    0    0    0    0
    0    0    8    2    0    0    0    0    1    0    0   31    0    0    1    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    1    4    0    0    0    0    0    1    3    0    0    1   11    0    0    0    0    0    0    0
    0    1   11    6    1    0    0    0    1    0    1   10    0    0    1    1    0    0    1    0    0    1    0    1    0    0    0    0    3    1    1    0    0    1    0    0    3    0    0    0    0    0    5    0    0    0    2   22    2    1    0    0    0    0    0
    0    2    8    0    0    0    0    0    0    4    0   22    0    0    0    1    1    0    0    1    0    0    0    0    0    0    0    0    2    4    0    0    0    0    2    0    0    1    0    0    1    0    5    0    0    0    0   45    2    0    0    0    0    0    0
    0    0   26    0    1    0    0    0    2    7    0   34    0    3    1    2    0    0    1    0    0    2    0    0    0    0    0    0    4    4    3    0    0    0    4    2    4    1    0    1    1    0   15    2    0    2    2   17    2    0    1    0    0    0    0
    0    0   19    4    0    0    0    0    0    0    0   26    0    5    4    2    0    0    0    0    0    0    3    0    0    1    0    0    2    6    2    1    0    5    1    1    1    3    0    1    3    0   13    0    0    0    5   36    4    0    0    0    0    0    0
    0    1   24    5    0    0    0    0    1    1    1  105    0    5    8    4    0    2    1    0    0    2    1    1    5    1    0    0    9    5    2    2    0    0    2    3    4    4    0    0    0    0   51    1    0    1    4   31    2    2    0    0    0    0    0
    0    3   20   17    0    0    1    4    0    4    2   99    1    1    6    2    0    0    2    0    1    0    0    0    3    3    0    1    3    9    0   10    1    1    1    2    4    0    0    1    5    1    3    3    0    0    8   43    4    0    0    0    0    0    0
    1    0   21   15    0    0    0    2    0    5    1   91    0    2    7    0    4   10    4    1    0    5    1    1    0    2    0    5   18   11    3    6    0    7    2    9    2    8    0    3    0    3   13    3    1    1    6  112    6    0    1    0    0    0    0
    0    0   49   28    0    1    3    4    6    6    4  111    1    1   14    3    1   13    5    2    0    4    8    1    1   11    6    6   33    0    3   17    5    0    1    8   16   13    0    5    0    0   15    5    2    1   21   93   19   15    0    0    0    3    1

...and so on.

关于python - 在Python中使用BS4确定HTML是否包含文本,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/62959171/

相关文章:

python - 如何使用 Python 登录网站以进行抓取

python/BeautifulSoup 使用字符串值访问子对象/标签

python - 将数据类型声明为 ruamel.yaml 以便它可以表示/序列化它?

Python、OpenCV : Increasing image brightness without overflowing UINT8 array

r - 使用循环通过网络抓取创建表格

python - 如何在 xpath 命令中使用 python scrapy 进行网络抓取的任意值

python - 检索完整的网页,包括动态加载的链接/图像

python - 使用python从网站中提取img url

python - 如果我有一个 python 程序正在运行,我可以编辑它运行的 .py 文件吗?

python - docker python simplehhtpserver 不工作