使用下面的代码,并尝试查找 href 末尾的值。有没有办法提取 href,并在 BeutifulSoup/Regex 中的 page=
之后查找值?
from bs4 import BeautifulSoup
import requests
import json
import re
request = requests.get('https://www.goodreads.com/quotes/tag/fun?page=1')
soup = BeautifulSoup(request.text, 'html.parser')
findNext = soup.find("a", class_="next_page")
print(findNext)
获取此输出:
<a class="next_page" href="/quotes/tag/fun?page=2" rel="next">next »</a>
注意:想要从上面或任何其他可能出现的数字中提取2
。
最佳答案
您可以使用正则表达式
查找页码:
from bs4 import BeautifulSoup
import re
request = requests.get('https://www.goodreads.com/quotes/tag/fun?page=1')
soup = BeautifulSoup(request.text, 'html.parser')
page_nums = re.findall('(?<=page\=)\d+', str(soup.find("a", class_="next_page")))[0]
输出:
2
关于javascript - 美丽汤/正则表达式 : Find specific value from href,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/48470114/