python - 用 BeautifulSoup 进行网页抓取 4

我正在尝试访问网站上的 HTML 代码 forexfactory.com并返回具有 worse 和 better 类的所有 span 标记。

import requests
from bs4 import BeautifulSoup

r = requests.get("https://www.forexfactory.com/#closed")

soup = BeautifulSoup(r.text, 'lxml')

table = soup.find("table", class_="calendar__table")

Wnews = []
Bnews = []
Tnews = []

for row in table.find_all('tr', class_='calendar__row--grey'):

currency = row.find("td", class_="currency")
# print(currency.prettify()) # before get text
currency = currency.get_text(strip=True)

actual = row.find("td", class_="actual")
actual = actual.get_text(strip=True)

impact = row.find("span", class_="worse")
try:
    impactW = impact.get_text(strip=True)
except AttributeError:
    continue

impact2 = row.find("span", class_="better")
try:
    impactB = impact2.get_text(strip=True)
except AttributeError:
    continue

# print(impact)

# news.append(currency)news.append(actual)

if currency == "GBP":

    actual = row.find("td", class_="actual")
    actual = actual.get_text(strip=True)

    Tnews.append(currency)

    forecast = row.find("td", class_="forecast")
    forecast = forecast.get_text(strip=True)

    Wnews.append(impactW)
    Bnews.append(impactB)

    print(impact2)

print(impact2) 返回多个带有 class = "Revised Better" 的所有 span 标签，而不仅仅是 better。我写错了什么？

最佳答案

要获取类worse的所有span标签，只需尝试下面的代码。使用css选择器。

worsedata=[item.text.strip() for item in soup.select('table.calendar__table tr.calendar__row--grey span.worse:not(.revised)')]
print(worsedata)

输出:

['0.0%', '-0.2%', '-0.3%', '-1.7%', '0.1%', '-1.2%']

<小时/>

仅获取span tag类更好。

betterdata=[item.text.strip() for item in soup.select('table.calendar__table tr.calendar__row--grey span.better:not(.revised)')]
print(betterdata)

输出:

['1.9%', '-5.3B']

关于python - 用 BeautifulSoup 进行网页抓取 4，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/59721267/

python - 用 BeautifulSoup 进行网页抓取 4

上一篇：python - 如何过滤连续月份的某些值？

下一篇：python - 如何将一个 ipynb 文件中定义的变量访问到 jupyter 笔记本中的另一个文件中？