Python、Scrapy、 Selenium : how to attach webdriver to "response" passed into a function to use it for further action

我正在尝试使用 Selenium 从 scrapy 蜘蛛的下拉列表中获取所选选项的值，但我不确定如何去做。这是我与 Selenium 的第一次互动。

正如您在下面的代码中看到的，我在 parse 函数中创建了一个请求，它调用 parse_page 函数作为回调。在 parse_page 中，我想提取所选选项的值。我不知道如何将 webdriver 附加到发送到 parse_page 的响应页面，以便能够在 Select 中使用它。我在下面写了一个明显错误的代码:(

from scrapy.spider import Spider
from scrapy.selector import Selector
from scrapy.http import Request
from scrapy.exceptions import CloseSpider
import logging
import scrapy
from scrapy.utils.response import open_in_browser
from scrapy.http import FormRequest
from scrapy.http import Request
from selenium import webdriver
from selenium.webdriver.support.ui import Select
from activityadvisor.items import TruYog

logging.basicConfig()
logger = logging.getLogger()

class TrueYoga(Spider):
    name = "trueyoga"
    allowed_domains = ["trueyoga.com.sg","trueclassbooking.com.sg"]
    start_urls = [
        "http://trueclassbooking.com.sg/frames/class-schedules.aspx",
    ]

    def parse(self, response):

        clubs=[]
        clubs = Selector(response).xpath('//div[@class="club-selections"]/div/div/div/a/@rel').extract()
        clubs.sort()
        print 'length of clubs = ' , len(clubs), '1st content of clubs = ', clubs
        req=[]
        for club in clubs:
            payload = {'ctl00$cphContents$ddlClub':club}
            req.append(FormRequest.from_response(response,formdata = payload, dont_click=True, callback = self.parse_page))
        for request in req:
            yield request

    def parse_page(self, response):
        driver = webdriver.Firefox()
        driver.get(response)
        clubSelect = Select(driver.find_element_by_id("ctl00_cphContents_ddlClub"))
        option = clubSelect.first_selected_option
        print option.text

有没有办法不使用Selenium就可以在scrapy中获取这个选项值？到目前为止，我在 google 和 stackoverflow 上的搜索没有产生任何有用的答案。

感谢您的帮助!

最佳答案

我建议使用 Downloader Middleware将 Selenium 响应传递给蜘蛛的 parse 方法。看看我写的例子 answer to another question .

关于Python、Scrapy、 Selenium : how to attach webdriver to "response" passed into a function to use it for further action，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/31284069/

Python、Scrapy、 Selenium : how to attach webdriver to "response" passed into a function to use it for further action

上一篇：Python - 获取线的周边区域(坐标)

下一篇：python - 从网格数据中提取坐标