我正在尝试抓取网页,但是尽管在 Chrome 中提供了正确的 CSS,但检查 Selenium 并没有抓取所有数据,它只抓取第一页的几率,如下所示,然后给出错误消息。
我重新测试了 CSS 并对其进行了多次更改,但是,Selenium Python 似乎无法正确抓取数据。
我也倾向于:
Traceback (most recent call last):
File "C:/Users/Bain3/PycharmProjects/untitled4/Vpalmerbet1.py", line 1365, in <module>
EC.element_to_be_clickable((By.CSS_SELECTOR, ('.match-pop-market a[href*="/sports/soccer/"]'))))
File "C:\Users\Bain3\Anaconda3\lib\site-packages\selenium\webdriver\support\wait.py", line 80, in until
raise TimeoutException(message, screen, stacktrace)
selenium.common.exceptions.TimeoutException: Message:
我尝试过更改 CSS 以及使用 xpath 来:
#clickMe = wait(driver, 15).until(EC.element_to_be_clickable((By.CSS_SELECTOR, ('.match-pop-market a[href*="/sports/soccer/"]'))))
clickMe = wait(driver, 15).until(EC.element_to_be_clickable((By.XPATH, ("//*[@class='match-pop-market']//a[href*='/sports/soccer/']"))))
你可以看到 chrome inspects 检测到了这个 CSS
我的完整代码是:
from selenium import webdriver
driver = webdriver.Chrome()
driver.set_window_size(1024, 600)
driver.maximize_window()
try:
os.remove('vtg121.csv')
except OSError:
pass
driver.get('https://www.palmerbet.com/sports/soccer')
#SCROLL_PAUSE_TIME = 0.5
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
#clickMe = wait(driver, 3).until(EC.element_to_be_clickable((By.XPATH, ('//*[@id="TopPromotionBetNow"]'))))
#if driver.find_element_by_css_selector('#TopPromotionBetNow'):
#driver.find_element_by_css_selector('#TopPromotionBetNow').click()
#last_height = driver.execute_script("return document.body.scrollHeight")
#while True:
#driver.execute_script("window.scrollTo(0, document.body.scrollHeight);")
#time.sleep(SCROLL_PAUSE_TIME)
#new_height = driver.execute_script("return document.body.scrollHeight")
#if new_height == last_height:
#break
#last_height = new_height
time.sleep(1)
clickMe = wait(driver, 10).until(EC.element_to_be_clickable((By.XPATH, ('//*[contains(@class,"filter_labe")]'))))
clickMe.click()
time.sleep(0)
clickMe = wait(driver, 10).until(EC.element_to_be_clickable((By.XPATH,'(//*[contains(@class,"filter_labe")])')))
options = driver.find_elements_by_xpath('//*[contains(@class,"filter_labe")]')
indexes = [index for index in range(len(options))]
shuffle(indexes)
for index in indexes:
time.sleep(0)
#driver.get('https://www.bet365.com.au/#/AS/B1/')
clickMe1 = wait(driver, 10).until(EC.element_to_be_clickable((By.XPATH,'(//ul[@id="tournaments"]//li//input)[%s]' % str(index + 1))))
clickMe1.click()
time.sleep(0)
##tournaments > li > input
#//*[@id='tournaments']//li//input
# Team
#clickMe = wait(driver, 10).until(EC.element_to_be_clickable((By.CSS_SELECTOR,("#mta_row td:nth-child(1)"))))
langs3 = driver.find_elements_by_css_selector("#mta_row td:nth-child(1)")
langs3_text = []
for lang in langs3:
print(lang.text)
langs3_text.append(lang.text)
time.sleep(0)
# Team ODDS
langs = driver.find_elements_by_css_selector("#mta_row .mpm_teams_cell_click:nth-child(2) .mpm_teams_bet_val")
langs_text = []
for lang in langs:
print(lang.text)
langs_text.append(lang.text)
time.sleep(0)
# HREF
#langs2 = driver.find_elements_by_xpath("//ul[@class='runners']//li[1]")
#a[href*="/sports/soccer/"]
#url1 = driver.current_url
#clickMe = wait(driver, 15).until(EC.element_to_be_clickable((By.CSS_SELECTOR, ('.match-pop-market a[href*="/sports/soccer/"]'))))
clickMe = wait(driver, 15).until(EC.element_to_be_clickable((By.XPATH, ("//*[@class='match-pop-market']//a[href*='/sports/soccer/']"))))
elems = driver.find_elements_by_css_selector('.match-pop-market a[href*="/sports/soccer/"]')
elem_href = []
for elem in elems:
print(elem.get_attribute("href"))
elem_href.append(elem.get_attribute("href"))
print(("NEW LINE BREAK"))
import sys
import io
with open('vtg121.csv', 'a', newline='', encoding="utf-8") as outfile:
writer = csv.writer(outfile)
for row in zip(langs_text, langs3_text, elem_href):
writer.writerow(row)
print(row)
最佳答案
您的 XPath 不正确。请注意,像 [href*="/sports/soccer/"]
这样的谓词可以在 CSS 选择器中使用,而在 XPath 中你应该使用 [contains(@href, "/sports/soccer/")]
。所以完整的行应该是
from selenium.common.exceptions import TimeoutException
try:
clickMe = wait(driver, 15).until(EC.element_to_be_clickable((By.XPATH, "//*[@class='match-pop-market']//a[contains(@href, '/sports/soccer/')]")))
clickMe1.click()
except TimeoutException:
print("No link was found")
关于python - 除了一页上的一个 CSS 列表外,不抓取所有请求的数据,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/46985895/