python - 无法使用装饰器处理具有不同分页的两个链接

标签 python python-3.x function web-scraping decorator

我使用两个不同的链接在 python 中编写了一个脚本(一个有分页,另一个没有) 来查看我的脚本是否可以获取所有下一页链接。如果没有分页选项,脚本必须打印此 No pagination found 行。

我已经应用了 @check_pagination 装饰器来检查分页是否存在,并且我想将此装饰器保留在我的抓取工具中。

我已经实现了上述目标,并遵守以下规定:

import requests
from bs4 import BeautifulSoup

urls = [
        "https://www.mobilehome.net/mobile-home-park-directory/maine/all",
        "https://www.mobilehome.net/mobile-home-park-directory/rhode-island/all"
    ]

def check_pagination(f):
  def wrapper(lead):
     if not lead.pages:
       print('No pagination found')
     return f(lead)
  return wrapper

class LinkScraper:
   def __init__(self, url):
     self.url = url
     self.home_page = requests.get(self.url).text
     self.soup = BeautifulSoup(self.home_page,"lxml")
     self.pages = [item.text for item in self.soup.find('div', {'class':'pagination'}).find_all('a')][:-1]

   @check_pagination
   def __iter__(self):
     for p in self.pages:
        link = requests.get(f'{self.url}/page/{p}')
        yield link.url

for url in urls:
    d = [page for page in LinkScraper(url)]
    print(d)

现在,我希望在不使用类的情况下执行相同的操作,并在脚本中保留 decorator 来检查分页,但似乎我在某个地方出错了decorator 这就是它不打印 No pagination found 的原因,即使链接没有分页。任何解决此问题的帮助将不胜感激。

import requests
from bs4 import BeautifulSoup

urls = [
        "https://www.mobilehome.net/mobile-home-park-directory/maine/all",
        "https://www.mobilehome.net/mobile-home-park-directory/rhode-island/all"
    ]

def check_pagination(f):
    def wrapper(*args,**kwargs):
        if not f(*args,**kwargs): 
            print("No pagination found")
        return f(*args,**kwargs)
    return wrapper

def get_base(url):
    page = requests.get(url).text
    soup = BeautifulSoup(page,"lxml")
    return [item.text for item in soup.find('div', {'class':'pagination'}).find_all('a')][:-1]

@check_pagination
def get_links(num):
    link = requests.get(f'{url}/page/{num}')
    return link.url

if __name__ == '__main__':
    for url in urls:
        links = [item for item in get_base(url)]
        for link in links:
            print(get_links(link))

最佳答案

只需将装饰器应用于get_base:

def check_pagination(f):
   def wrapper(*args,**kwargs):
     result = f(*args,**kwargs)
     if not result: 
        print("No pagination found")
     return result
   return wrapper

@check_pagination  
def get_base(url):
   page = requests.get(url).text
   soup = BeautifulSoup(page,"lxml")
   return [item.text for item in soup.find('div', {'class':'pagination'}).find_all('a')][:-1]


def get_links(num):
   link = requests.get(f'{url}/page/{num}')
   return link.url

if __name__ == '__main__':
  for url in urls:
    links = [item for item in get_base(url)]
    for link in links:
        print(get_links(link))

关于python - 无法使用装饰器处理具有不同分页的两个链接,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/53638221/

相关文章:

python - imshow 对此图像执行什么颜色转换(或其他操作)

python - 分发嵌入 Cython 编译的代码并使其在任何机器上工作所需的最少文件集

python - 为什么这个检测正确变量名称的循环不能正常工作?

python - 如何在 python 3.x 中使用 string.replace()

python - 如何在特定情况下对Python中的任意数字进行四舍五入

function - 如何编写包含导入函数的自己定义的函数?

PHP 命名空间函数最佳实践

python - 我的 scrapy 无法得到有效响应

python - 转换格式为 mm :ss to a time format pandas 的对象数据类型列

bash - 在一行中将位置参数分配给多个变量