Python - 如何获取所有实例而不仅仅是页面上的第一个实例

标签 python web-scraping beautifulsoup

使用 findAll 会出现错误“TypeError:列表索引必须是整数,而不是 str”,而使用 .find 则不会。使用 findall 会出现错误“TypeError:‘NoneType’对象不可调用”。

定位页面上具有“框架”类的所有链接(而不仅仅是第一个实例)的正确方法是什么?

import requests
from bs4 import BeautifulSoup

url = ("http://www.gym-directory.com/listing-category/gyms-fitness-centres/")
r = requests.get(url)
soup = BeautifulSoup(r.text, 'html.parser')
print soup.findAll("a",{"class":"frame"})["href"]

url = ("http://www.gym-directory.com/listing-category/gyms-fitness-centres/page/2/")
r = requests.get(url)
soup = BeautifulSoup(r.text, 'html.parser')
print soup.findAll("a",{"class":"frame"})["href"]

url = ("http://www.gym-directory.com/listing-category/gyms-fitness-centres/page/3/")
r = requests.get(url)
soup = BeautifulSoup(r.text, 'html.parser')
print soup.findAll("a",{"class":"frame"})["href"]

url = ("http://www.gym-directory.com/listing-category/gyms-fitness-centres/page/4/")
r = requests.get(url)
soup = BeautifulSoup(r.text, 'html.parser')
print soup.findAll("a",{"class":"frame"})["href"]

url = ("http://www.gym-directory.com/listing-category/gyms-fitness-centres/page/5/")
r = requests.get(url)
soup = BeautifulSoup(r.text, 'html.parser')
print soup.findAll("a",{"class":"frame"})["href"]

url = ("http://www.gym-directory.com/listing-category/gyms-fitness-centres/page/6/")
r = requests.get(url)
soup = BeautifulSoup(r.text, 'html.parser')
print soup.findAll("a",{"class":"frame"})["href"]

url = ("http://www.gym-directory.com/listing-category/gyms-fitness-centres/page/7/")
r = requests.get(url)
soup = BeautifulSoup(r.text, 'html.parser')
print soup.findAll("a",{"class":"frame"})["href"]

url = ("http://www.gym-directory.com/listing-category/gyms-fitness-centres/page/8/")
r = requests.get(url)
soup = BeautifulSoup(r.text, 'html.parser')
print soup.findAll("a",{"class":"frame"})["href"]

url = ("http://www.gym-directory.com/listing-category/gyms-fitness-centres/page/9/")
r = requests.get(url)
soup = BeautifulSoup(r.text, 'html.parser')
print soup.findAll("a",{"class":"frame"})["href"]

最佳答案

问题是 soup.findAll() 返回一个 list,并且您尝试使用 ["href"] 访问该列表>

您需要做的是:

for elem in soup.findAll("a", {"class": "frame"}):
    print elem["href"]

关于Python - 如何获取所有实例而不仅仅是页面上的第一个实例,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/32814578/

相关文章:

xpath - 支持 xpath 的抓取框架

python - Tensorflow 没有正确安装

python - 查找另一个数字介于哪些数字对之间的优化方法?

python 名称错误 : global name '__file__' is not defined

python - Scrapy LinkExtractor - 要遵循哪个 RegEx?

Python - BeautifulSoup html parsing handle gbk encoding poorly - Chinese webscraping 问题

python - 使用 pandas 删除数据行会增加内存使用量

Python 循环和网页抓取 |美汤

python - XML 阅读器似乎忽略了标签层次结构

python - 将结果写入.xls(向网页提交2个查询,并将不同的结果存储到.xls)