python - 如何使用 Selenium 在动态加载的网页中正确滚动？

以下是网站链接:website

我想要这个位置的所有酒店的链接。

这是我的脚本:

import pandas as pd
import numpy as np
from selenium import webdriver
import time

PATH = "driver\chromedriver.exe"

options = webdriver.ChromeOptions() 
options.add_argument("--disable-gpu")
options.add_argument("--window-size=1200,900")
options.add_argument('enable-logging')


driver = webdriver.Chrome(options=options, executable_path=PATH)

driver.get('https://fr.hotels.com/search.do?destination-id=10398359&q-check-in=2021-06-24&q-check-out=2021-06-25&q-rooms=1&q-room-0-adults=2&q-room-0-children=0&sort-order=BEST_SELLER')

cookie = driver.find_element_by_xpath('//button[@class="uolsaJ"]')
try:
    cookie.click()
except:
    pass

for i in range(30):
    driver.execute_script("window.scrollBy(0, 1000)")
    time.sleep(5)

time.sleep(5)

my_elems = driver.find_elements_by_xpath('//a[@class="_61P-R0"]')

links = [my_elem.get_attribute("href") for my_elem in my_elems]


X = np.array(links)
print(X.shape)
#driver.close()

但是我找不到一种方法来告诉脚本:向下滚动，直到没有什么可以滚动。

我尝试更改此参数:

for i in range(30):
    driver.execute_script("window.scrollBy(0, 1000)")
    time.sleep(30)

我更改了 time.sleep()、数字 1000 等，但我的输出不断变化，而且方式不正确。

output

正如你所看到的，我已经抓取了很多不同的数字。如何使我的脚本每次抓取相同的量？不一定是每个链接，但最终会有一个稳定的数字。

它在这里滚动，在某一时刻它似乎被阻止并刮掉了它目前拥有的所有链接。这不合适。

最佳答案

这里有几个问题。

只有在完成滚动后才能获取元素及其链接，而您应该在滚动循环内执行此操作。
您应该等到 Cookie 警报出现后再将其关闭。
您可以滚动直到显示页脚元素。
像这样的事情:

import pandas as pd
import numpy as np
from selenium import webdriver
import time
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC

PATH = "driver\chromedriver.exe"

options = webdriver.ChromeOptions() 
options.add_argument("--disable-gpu")
options.add_argument("--window-size=1200,900")
options.add_argument('enable-logging')


driver = webdriver.Chrome(options=options, executable_path=PATH)
wait = WebDriverWait(driver, 20)

driver.get('https://fr.hotels.com/search.do?destination-id=10398359&q-check-in=2021-06-24&q-check-out=2021-06-25&q-rooms=1&q-room-0-adults=2&q-room-0-children=0&sort-order=BEST_SELLER')

wait.until(EC.visibility_of_element_located((By.XPATH, '//button[@class="uolsaJ"]'))).click()

def is_element_visible(xpath):
    wait1 = WebDriverWait(driver, 2)
    try:
        wait1.until(EC.visibility_of_element_located((By.XPATH, xpath)))
        return True
    except Exception:
        return False

while not is_element_visible("//footer[@id='footer']"):
    my_elems = driver.find_elements_by_xpath('//a[@class="_61P-R0"]')

    links = [my_elem.get_attribute("href") for my_elem in my_elems]

    X = np.array(links)
    print(X.shape)

    driver.execute_script("window.scrollBy(0, 1000)")
    time.sleep(5)


#driver.close()

关于python - 如何使用 Selenium 在动态加载的网页中正确滚动？，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/68127521/

python - 如何使用 Selenium 在动态加载的网页中正确滚动？

上一篇：python - 如何缩写 Jupyter Notebook 中的回溯？

下一篇：java - 使用 Spring Webflux 将 Flux 转换为树