python - 除了一页上的一个 CSS 列表外,不抓取所有请求的数据

标签 python css selenium xpath web-scraping

我正在尝试抓取网页,但是尽管在 Chrome 中提供了正确的 CSS,但检查 Selenium 并没有抓取所有数据,它只抓取第一页的几率,如下所示,然后给出错误消息。

我重新测试了 CSS 并对其进行了多次更改,但是,Selenium Python 似乎无法正确抓取数据。

我也倾向于:

Traceback (most recent call last):
  File "C:/Users/Bain3/PycharmProjects/untitled4/Vpalmerbet1.py", line 1365, in <module>
    EC.element_to_be_clickable((By.CSS_SELECTOR, ('.match-pop-market a[href*="/sports/soccer/"]'))))
  File "C:\Users\Bain3\Anaconda3\lib\site-packages\selenium\webdriver\support\wait.py", line 80, in until
    raise TimeoutException(message, screen, stacktrace)
selenium.common.exceptions.TimeoutException: Message: 

我尝试过更改 CSS 以及使用 xpath 来:

#clickMe = wait(driver, 15).until(EC.element_to_be_clickable((By.CSS_SELECTOR, ('.match-pop-market a[href*="/sports/soccer/"]'))))

clickMe = wait(driver, 15).until(EC.element_to_be_clickable((By.XPATH, ("//*[@class='match-pop-market']//a[href*='/sports/soccer/']"))))

你可以看到 chrome inspects 检测到了这个 CSS

enter image description here

我的完整代码是:

from selenium import webdriver
driver = webdriver.Chrome()
driver.set_window_size(1024, 600)
driver.maximize_window()

try:
    os.remove('vtg121.csv')
except OSError:
    pass

driver.get('https://www.palmerbet.com/sports/soccer')

#SCROLL_PAUSE_TIME = 0.5


from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC

#clickMe = wait(driver, 3).until(EC.element_to_be_clickable((By.XPATH, ('//*[@id="TopPromotionBetNow"]'))))
#if driver.find_element_by_css_selector('#TopPromotionBetNow'):
    #driver.find_element_by_css_selector('#TopPromotionBetNow').click()

#last_height = driver.execute_script("return document.body.scrollHeight")

#while True:

    #driver.execute_script("window.scrollTo(0, document.body.scrollHeight);")


    #time.sleep(SCROLL_PAUSE_TIME)


    #new_height = driver.execute_script("return document.body.scrollHeight")
    #if new_height == last_height:
        #break
    #last_height = new_height

time.sleep(1)

clickMe = wait(driver, 10).until(EC.element_to_be_clickable((By.XPATH, ('//*[contains(@class,"filter_labe")]'))))
clickMe.click()
time.sleep(0)
clickMe = wait(driver, 10).until(EC.element_to_be_clickable((By.XPATH,'(//*[contains(@class,"filter_labe")])')))
options = driver.find_elements_by_xpath('//*[contains(@class,"filter_labe")]')

indexes = [index for index in range(len(options))]
shuffle(indexes)
for index in indexes:
    time.sleep(0)
    #driver.get('https://www.bet365.com.au/#/AS/B1/')
    clickMe1 = wait(driver, 10).until(EC.element_to_be_clickable((By.XPATH,'(//ul[@id="tournaments"]//li//input)[%s]' % str(index + 1))))
    clickMe1.click()
    time.sleep(0)
    ##tournaments > li > input
    #//*[@id='tournaments']//li//input

    # Team

#clickMe = wait(driver, 10).until(EC.element_to_be_clickable((By.CSS_SELECTOR,("#mta_row td:nth-child(1)"))))
langs3 = driver.find_elements_by_css_selector("#mta_row   td:nth-child(1)")
langs3_text = []

for lang in langs3:
    print(lang.text)

    langs3_text.append(lang.text)
time.sleep(0)

# Team ODDS
langs = driver.find_elements_by_css_selector("#mta_row   .mpm_teams_cell_click:nth-child(2)   .mpm_teams_bet_val")
langs_text = []

for lang in langs:
    print(lang.text)
    langs_text.append(lang.text)
time.sleep(0)


# HREF
#langs2 = driver.find_elements_by_xpath("//ul[@class='runners']//li[1]")
#a[href*="/sports/soccer/"]
#url1 = driver.current_url

#clickMe = wait(driver, 15).until(EC.element_to_be_clickable((By.CSS_SELECTOR, ('.match-pop-market a[href*="/sports/soccer/"]'))))
clickMe = wait(driver, 15).until(EC.element_to_be_clickable((By.XPATH, ("//*[@class='match-pop-market']//a[href*='/sports/soccer/']"))))
elems = driver.find_elements_by_css_selector('.match-pop-market a[href*="/sports/soccer/"]')
elem_href = []
for elem in elems:
    print(elem.get_attribute("href"))
    elem_href.append(elem.get_attribute("href"))


print(("NEW LINE BREAK"))
import sys
import io


with open('vtg121.csv', 'a', newline='', encoding="utf-8") as outfile:
    writer = csv.writer(outfile)
    for row in zip(langs_text, langs3_text, elem_href):
        writer.writerow(row)
        print(row)

最佳答案

您的 XPath 不正确。请注意,像 [href*="/sports/soccer/"] 这样的谓词可以在 CSS 选择器中使用,而在 XPath 中你应该使用 [contains(@href, "/sports/soccer/")]。所以完整的行应该是

from selenium.common.exceptions import TimeoutException

try:
    clickMe = wait(driver, 15).until(EC.element_to_be_clickable((By.XPATH, "//*[@class='match-pop-market']//a[contains(@href, '/sports/soccer/')]")))
    clickMe1.click()
except TimeoutException:
    print("No link was found")

关于python - 除了一页上的一个 CSS 列表外,不抓取所有请求的数据,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/46985895/

相关文章:

html - css响应中的垂直对齐

ruby - Selenium找不到元素时如何恢复

selenium - 您可以在同一个测试/框架中将 TestNG 和 JUnit 断言混合在一起吗?

python - pylint 1.4 上忽略多个文件

python - 如何遍历 python 中的列表列表?

php - 我想在 WordPress 的 CSS 中添加背景图片

php - 使用 Bootstrap 4 水平居中动态 div.card

python - 如何在同一个 docker-compose 中连接到远程 Selenium 驱动程序?

php - 使用 php 读取 python 程序的实时输出

Python:在脚本中运行脚本时找不到模块