python - 查找网站上有 soup.findall unicode 问题的页面数

标签 python string unicode beautifulsoup findall

嗨，我正在尝试使用 Python 2.7 和 Beautifulsoup 查找网站上的页面数。我尝试使用此代码从分页行获取页数。

#!/usr/bin/env python
# -*- coding: utf-8 -*-
import urllib2

from bs4 import BeautifulSoup
headers = {'User-Agent': 'Mozilla/5.0'}
req = urllib2.Request("https://www.sikayetvar.com", None,headers)
resp  = urllib2.urlopen(req)
html = resp.read()
soup = BeautifulSoup(html)
pages = soup.find_all('div', attrs = {'class' : 'pagination row'})
for page in pages:
   print page.text

输出如下: 1 2 3 4 5 6 7 ... 第807章

我只需要数字 807，但是 soup.findall 将其接收为 unicode 我用 type 尝试了这个。我应该把它变成一个字符串并找到最大数字，在这种情况下 (...) 会产生问题，我想或者我应该尝试找到 findall 的最后一个元素，但这又不是一个列表，它是 unicode。我真的需要一些帮助，谢谢。

最佳答案

我无法安装 urllib。所以我将使用 requests 库。您可以使用 pip install requests 来安装它

import requests 
from bs4 import BeautifulSoup
headers = {'User-Agent': 'Mozilla/5.0'}

response = requests.get("https://www.sikayetvar.com/a101", headers = headers)

soup = BeautifulSoup(response.text,'lxml')

#This will you all a tags in div that has pagination class
pages = soup.select('div.pagination a')

#Last element is next page. The previous is your last page number.
#So we are going to take second last item

print(pages[-2].text)
#Output is 807

关于python - 查找网站上有 soup.findall unicode 问题的页面数，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/51836275/

上一篇：python - 在Python中操作具有相同计数的元素的顺序

下一篇：python - 如何查看str包含bool数组的内容？

java - 将整个 JList 显示到 JTextArea

java - 在 Java GUI 中使用 Unicode 字符安全吗？

python - 'unicode'和 'encode'有什么关系

python os.walk 和 unicode 错误

python - 通过 shell 杀死进程而不杀死自身

python - 如果子字符串替换了随机字符，如何找到子字符串？

python - pandas:聚合以保留第一个非 NaN 值

c# - 将字符串转换为 Short

javascript - 为什么在 javascript 中使用辅助函数构建字符串？