python - 无法在 TripAdvisor 中使用 Selenium 抢课

我正在尝试抓取特定 TripAdivsor 页面的所有图像，但在 Selenium 中使用 find_elements_by_class_name 函数时，它没有给我任何值。我很困惑，因为这是我想要迭代并附加到列表的值的确切类名，这里是 site 。任何帮助将不胜感激!

# importing dependencies
import re
import selenium
import io
import pandas as pd
import urllib.request
import urllib.parse
import requests
from bs4 import BeautifulSoup
import pandas as pd
from selenium.webdriver.common.action_chains import ActionChains
from selenium import webdriver
import time
from _datetime import datetime
from selenium.webdriver.common.keys import Keys


#setup opening url window of website to be scraped
options = webdriver.ChromeOptions()
options.headless=False
prefs = {"profile.default_content_setting_values.notifications" : 2} 
options.add_experimental_option("prefs", prefs)
driver = webdriver.Chrome("/Users/rishi/Downloads/chromedriver 3") #possible issue by not including the file extension
driver.maximize_window()
time.sleep(5)
driver.get("""https://www.tripadvisor.com/""") #get the information from the page

#automate searching for hotels in specific city
driver.find_element_by_xpath('/html/body/div[2]/div/div[6]/div[1]/div/div/div/div/span[1]/div/div/div/a').click() #clicks on hotels option
driver.implicitly_wait(12) #allows xpath to be found
driver.find_element_by_xpath('//*[@id="BODY_BLOCK_JQUERY_REFLOW"]/div[12]/div/div/div[1]/div[1]/div/input').send_keys("Washington D.C.", Keys.ENTER) #change string to get certain city
time.sleep(8)

#now get current url
url = driver.current_url

response = requests.get(url)
response = response.text
data = BeautifulSoup(response, 'html.parser')

#get list of all hotels
hotels = driver.find_elements_by_class_name("prw_rup prw_meta_hsx_responsive_listing ui_section listItem")

print("Total Number of Hotels: ", len(hotels))

最佳答案

我建议，如果你使用 Selenium，不要在它旁边使用 BeautifulSoup，因为你可以使用 Selenium 得到你想要的任何东西。

您可以简单地实现您的目标，如下所示:

driver = webdriver.Chrome("/Users/rishi/Downloads/chromedriver 3")
driver.maximize_window()

driver.get("https://www.tripadvisor.ca/Hotels")

time.sleep(1)

driver.implicitly_wait(12)
driver.find_element_by_xpath('//*[@class="typeahead_input"]').send_keys("Washington D.C.", Keys.ENTER)
time.sleep(1)
hotels = driver.find_elements_by_xpath('//*[@class="listing collapsed"]')

print("Total Number of Hotels: ", len(hotels))

请注意，使用此代码您将获得前 30 家酒店(即第一页)。您需要遍历指定城市的所有酒店页面才能获取全部酒店。

希望有帮助。

关于python - 无法在 TripAdvisor 中使用 Selenium 抢课，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/60312171/

python - 无法在 TripAdvisor 中使用 Selenium 抢课

上一篇：Python 不允许我在使用子图时更改轴刻度数

下一篇：python - 如何从 3d NumPy 数组绘制单个像素值？