我正在尝试访问网站上的 HTML
代码 forexfactory.com并返回具有 worse
和 better
类的所有 span 标记。
import requests
from bs4 import BeautifulSoup
r = requests.get("https://www.forexfactory.com/#closed")
soup = BeautifulSoup(r.text, 'lxml')
table = soup.find("table", class_="calendar__table")
Wnews = []
Bnews = []
Tnews = []
for row in table.find_all('tr', class_='calendar__row--grey'):
currency = row.find("td", class_="currency")
# print(currency.prettify()) # before get text
currency = currency.get_text(strip=True)
actual = row.find("td", class_="actual")
actual = actual.get_text(strip=True)
impact = row.find("span", class_="worse")
try:
impactW = impact.get_text(strip=True)
except AttributeError:
continue
impact2 = row.find("span", class_="better")
try:
impactB = impact2.get_text(strip=True)
except AttributeError:
continue
# print(impact)
# news.append(currency)news.append(actual)
if currency == "GBP":
actual = row.find("td", class_="actual")
actual = actual.get_text(strip=True)
Tnews.append(currency)
forecast = row.find("td", class_="forecast")
forecast = forecast.get_text(strip=True)
Wnews.append(impactW)
Bnews.append(impactB)
print(impact2)
print(impact2)
返回多个带有 class = "Revised Better"
的所有 span 标签,而不仅仅是 better
。我写错了什么?
最佳答案
要获取类worse
的所有span标签,只需尝试下面的代码。使用css选择器。
worsedata=[item.text.strip() for item in soup.select('table.calendar__table tr.calendar__row--grey span.worse:not(.revised)')]
print(worsedata)
输出:
['0.0%', '-0.2%', '-0.3%', '-1.7%', '0.1%', '-1.2%']
<小时/>
仅获取span tag
类更好。
betterdata=[item.text.strip() for item in soup.select('table.calendar__table tr.calendar__row--grey span.better:not(.revised)')]
print(betterdata)
输出:
['1.9%', '-5.3B']
关于python - 用 BeautifulSoup 进行网页抓取 4,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/59721267/