我正在尝试使用 Selenium 从 scrapy 蜘蛛的下拉列表中获取所选选项的值,但我不确定如何去做。这是我与 Selenium 的第一次互动。
正如您在下面的代码中看到的,我在 parse
函数中创建了一个请求,它调用 parse_page
函数作为回调。在 parse_page
中,我想提取所选选项的值。我不知道如何将 webdriver 附加到发送到 parse_page 的响应页面,以便能够在 Select 中使用它。我在下面写了一个明显错误的代码:(
from scrapy.spider import Spider
from scrapy.selector import Selector
from scrapy.http import Request
from scrapy.exceptions import CloseSpider
import logging
import scrapy
from scrapy.utils.response import open_in_browser
from scrapy.http import FormRequest
from scrapy.http import Request
from selenium import webdriver
from selenium.webdriver.support.ui import Select
from activityadvisor.items import TruYog
logging.basicConfig()
logger = logging.getLogger()
class TrueYoga(Spider):
name = "trueyoga"
allowed_domains = ["trueyoga.com.sg","trueclassbooking.com.sg"]
start_urls = [
"http://trueclassbooking.com.sg/frames/class-schedules.aspx",
]
def parse(self, response):
clubs=[]
clubs = Selector(response).xpath('//div[@class="club-selections"]/div/div/div/a/@rel').extract()
clubs.sort()
print 'length of clubs = ' , len(clubs), '1st content of clubs = ', clubs
req=[]
for club in clubs:
payload = {'ctl00$cphContents$ddlClub':club}
req.append(FormRequest.from_response(response,formdata = payload, dont_click=True, callback = self.parse_page))
for request in req:
yield request
def parse_page(self, response):
driver = webdriver.Firefox()
driver.get(response)
clubSelect = Select(driver.find_element_by_id("ctl00_cphContents_ddlClub"))
option = clubSelect.first_selected_option
print option.text
有没有办法不使用Selenium就可以在scrapy中获取这个选项值?到目前为止,我在 google 和 stackoverflow 上的搜索没有产生任何有用的答案。
感谢您的帮助!
最佳答案
我建议使用 Downloader Middleware将 Selenium 响应传递给蜘蛛的 parse
方法。看看我写的例子 answer to another question .
关于Python、Scrapy、 Selenium : how to attach webdriver to "response" passed into a function to use it for further action,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/31284069/