python - 链接到事件点击时如何从网站抓取数据？

标签 python web-scraping scrapy extract

我正在尝试从 Tripadvisor.com 网页上抓取/提取公司/酒店的网站。我在检查页面时没有看到网站网址。关于如何使用 python 提取网站 url 的任何想法？提前致歉，因为我最近才开始“使用 Python 进行网络抓取”。谢谢。

例如请看图中的两个红色箭头。当我选择网站链接时，它会将我带到“http://www.i-love-my-india.com/” ' - 这就是我想使用 Python 提取的内容。

Tripadvisor url

最佳答案

使用 Selenium 试试这个:

import time
from selenium import webdriver

browser = webdriver.Firefox(executable_path="C:\\Users\\Vader\\geckodriver.exe")
# Must install geckodriver (handles your browser)- see instructions on
# http://selenium-python.readthedocs.io/installation.html.
# Change the path to where your geckodriver file is.

browser.get('https://www.tripadvisor.co.uk/Attraction_Review-g304551-d4590508-Reviews-Ashok_s_Taxi_Tours-New_Delhi_National_Capital_Territory_of_Delhi.html')
browser.find_element_by_css_selector('.blEntry.website').click()

#browser.window_handles # Results is 2 tabs opened. 

browser.switch_to.window(browser.window_handles[1]) # changes the browser to 
                                                    # the second one

time.sleep(1) # When I went directly I was getting a 'blank' result, so I put
              # a little delay and it worked (I really do not know why).

res = browser.current_url # the URL

print(res)

browser.quit() # Closes the browser

Selenium

关于python - 链接到事件点击时如何从网站抓取数据？，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/48553427/

上一篇：python - Pandas ，阅读 CSV 忽略额外的逗号

下一篇：Python — 检查字符串是否包含西里尔字符

python - 未绑定(bind)本地错误 : local variable . 。赋值前引用

python - 将自定义 PTPv2 层添加到 scapy

python - Scrapy编程错误: Not all parameters were used in the SQL statement

python - 如何使用加载的数据在 ItemLoader 中添加新值？

python - Django celery - asyncio - 守护进程不允许有 child

python - MechanicalSoup 棘手的 html 表格

javascript - 如何在 puppeteer 中滚动浏览多个 iframe

javascript - 在 JavaScript 中提取给定 xpath 的数据/值

python - 抓取时难以使用 Xpath/CSS