javascript - Python 请求和 Forbes 'Welcome' 页面重定向

标签 javascript python selenium beautifulsoup python-requests

请求是否可以在福布斯欢迎页面中导航?我正在尝试访问这篇文章

http://www.forbes.com/sites/andygreenberg/2012/10/15/how-i-accidentally-helped-compromise-the-secret-keys-of-high-security-handcuffs/

对于大多数人来说,最终会出现一个欢迎页面,然后重定向到实际的文章本身。我注意到,在 Chrome 中,一旦文章的 URL 解析为实际文章,它就会附加一个值,尽管这每次看起来都是随机的。

http://www.forbes.com/sites/andygreenberg/2012/10/15/how-i-accidentally-helped-compromise-the-secret-keys-of-high-security-handcuffs/#216cc0922071

我感觉这可能涉及 cookie,但到目前为止,除了构成欢迎页面的 html 之外,我的代码还没有获取任何 html。

url = 'http://www.forbes.com/sites/andygreenberg/2012/10/15/how-i-accidentally-helped-compromise-the-secret-keys-of-high-security-handcuffs/'
hdrs = {"User-Agent": 'Mozilla/5.0 (Windows NT 6.0; WOW64; rv:24.0) Gecko/20100101 Firefox/24.0'}
session = requests.session()
text = session.get(url, headers=hdrs, allow_redirects=True)
print ('headers', text.headers)
print ('cookies', requests.utils.dict_from_cookiejar(session.cookies))
print ('html',  text.text)

输出

headers {'Content-Type': 'text/html;charset=utf-8', 'Backend': 'templates', 'Date': 'Tue, 30 Aug 2016 22:37:15 GMT', 'Connection': 'keep-alive', 'Accept-Ranges': 'bytes', 'Content-Language': 'en-US', 'X-Cnection': 'close', 'X-Frame-Options': 'SAMEORIGIN', 'Content-Length': '1983', 'Server': '', 'Vary': 'Accept-Encoding', 'Content-Encoding': 'gzip'}
cookies {'forbesbeta': 'A'}
html <!DOCTYPE html><html class="no-js" lang=""><head><title>Forbes Welcome</title><meta charset="UTF-8"><meta http-equiv="X-UA-Compatible" content="IE=edge,chrome=1"><meta name="viewport" content="width=device-width,initial-scale=1,maximum-scale=2"><meta name="description" content="Forbes Welcome page -- Forbes is a global media company, focusing on business, investing, technology, entrepreneurship, leadership, and lifestyle."><meta name="keywords" content="business news, market analysis, company profiles, personal finance, management, entrepreneurship, investments, financial advice, economy, technology news"><link rel="stylesheet" href="http://i.forbesimg.com/welcomead/styles/abd4e3d6.main.css"><script type="text/javascript">fbs_settings = {
                mobile: 'false',
                preview: 'false',
                test: 'false',
                classes: 'WyJwYWdlR29vZ2xlQWRTdWJjb250ZW50IiwiYWRoaSIsImFkX2tleXdvcmRzX2JvdF9yIiwiZ29vZ2xlLWFkLWFmYy1oZWFkZXIiLCJhcnRpY2xlX2JvdHRvbV9hZCIsImFkc1lOIiwidG9wQWRXcmFwcGVyIiwicmVnaW9uLW1pZGRsZS1hZCIsImFkc0RpdiIsInNfYWQyIiwiYWR3b3JkLWJveCIsImpzLWFkLWltdSIsImFkLXNwb25zb3JlZC1wb3N0IiwiY2VudGVyQWQiLCJiei1hZCIsImFkLTcyOHg5MCIsImdwdC1hZHMiLCJzcG9uc29yLXRleHQtY29udGFpbmVyIiwiYWRfcmVjdGFuZ3VsYXIiLCJob21lQWRCb3hJbkJpZ25ld3MiLCJwb3NfYWR2ZXJ0IiwiY29udGFpbnMtYWQiLCJ0b3AtYWRzZW5zZS1iYW5uZXIiLCJwYWdlSGVhZGVyQWQiLCJibG9jay1zcG9uc29yZWQtbGlua3MiLCJhZDI1MC1oMSIsImNoYW5nZV9BZENvbnRhaW5lciIsImFkX2dyaWQiLCJzcG9uc29yLXNlcnZpY2VzIiwidmlld19hZHNfYm90dG9tX2JnIl0='
            };</script><script type="text/javascript">try {
                fbs_settings.data = {"channel":"channel_0","section":"section_0","location":"welcomead_default","panel":"welcome_ad","contentPositions":[{"position":1,"title":"Quote of the Day","description":"\"Success is a terrible thing and a wonderful thing... Just do what you love.”","following":false,"byline":"Gene Wilder","hideDescription":false,"sponsored":false,"twitterHandle":"","hashtag":""}],"panelId":"panel4","limit":0,"swimlane":false,"more":false,"enableAds":false,"removeBVPrepend":false,"brandvoiceHeader":false,"profileLink":false,"fullListLink":false,"pagination":false,"filters":false,"year":0};
            } catch (err) {
                fbs_settings.data = null;
            }</script><script type="text/javascript">try {
                fbs_settings.angular_preload = ["//i.forbesimg.com/forbes/scripts/c632bd7f.vendor.js","//i.forbesimg.com/forbes/scripts/99f3b378.scripts.js","//i.forbesimg.com/forbes/styles/860430fd.main.css"];
            } catch (err) {
                fbs_settings.angular_preload = null;
            }</script><script src="http://i.forbesimg.com/welcomead/scripts/vendor/69216742.modernizr.js"></script></head><body><div id="app" class="container clearfix default-template ad-300-by-250"><div id="navigation"></div><div id="content"><div id="adblock-hover" class="hidden"><span class="close-btn preloaded"><span class="close">CLOSE</span> <i class="icon icon-close"></i></span> <img> <a href="//www.forbes.com/adblock/instructions/" target="_blank">More Options</a></div>  <script>(function() {
                        setTimeout(function() {
                            var inviEles = document.getElementsByClassName('invisible');
                            for (var ele in inviEles) {
                                if (!inviEles[0]) {
                                    return;
                                }
                                inviEles[0].className = inviEles[0].className.replace('invisible', '');
                            }
                            if (window.performance && performance.mark) {
                                performance.mark('content_visible');
                            }
                        });
                    })();</script><div class="content-container"><div class="content-inner"><h1 class="title">  <i class="invisible branding icon icon-forbes-logo"></i> <span class="top invisible">Quote of</span> <span class="bottom invisible">the Day</span></h1><div class="body">  <p class="body-content invisible">"Success is a terrible thing and a wonderful thing... Just do what you love.”</p>  <p class="body-byline invisible">Gene Wilder</p>  </div></div></div><div class="circle-wrapper"><div class="circle invisible"></div><img class="circle fallback hidden" src="http://i.forbesimg.com/welcomead/images/circle.png"></div>  </div><div id="ads"></div></div><!--[if lte IE 9]>
        <script src="http://i.forbesimg.com/welcomead/scripts/b9b8347c.legacy.js"></script>
        <![endif]--><script src="http://i.forbesimg.com/welcomead/scripts/1a364ca6.vendor.js"></script><script src="http://i.forbesimg.com/welcomead/scripts/8951c3c8.main.js"></script></body></html>

我想,由于浏览器最终可以解析文章,Requests 也应该能够解析,但由于我无法弄清楚福布斯在做什么,所以我无法弄清楚如何适本地设计 Requests 参数。有什么想法吗?

最佳答案

当时我从来没有打扰过,但后来在另一个项目中使用了 Selenium,并且有用户请求提供答案,所以这里是使用 selenium 来通过福布斯启动页面的基础知识。

您需要安装 selenium 驱动程序,可以是 firefox 驱动程序、chrome 驱动程序或 headless 的 PhantomJS。如果您使用的是 Mac,则可以通过 Homebrew 轻松安装 chromedriver,或者将单个 PhantomJS 驱动程序文件复制到 #comment

中指示的路径
from selenium import webdriver
url = 'http://www.forbes.com/sites/andygreenberg/2012/10/15/how-i-accidentally-helped-compromise-the-secret-keys-of-high-security-handcuffs/'
browser = webdriver.Chrome() # or webdriver.PhantomJS('usr/bin/phantomjs')

browser.get(url)
browser.implicitly_wait(5)
browser.find_element_by_xpath('/html/body/div/div[1] /div/div[1]').click() #  a very explicit xpath to the continue button

# now grab whatever you want from the resulting code using...

browser.find_element_by_css_selector('css selector info').get_attribute('innerHTML')
browser.find_element_by_xpath('xpath info').get_attribute('innerHTML') 
# 'innerHTML grabs whatever the tags you select are surrounding, but other attributes are also possible such as ('href') on an <a> tag.

关于javascript - Python 请求和 Forbes 'Welcome' 页面重定向,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/39238323/

相关文章:

javascript - 缩短 javascript RegExp 条件

python - 如何修复 "UnsatisfiableError: The following specifications were found to be incompatible with each other: - pip -> python=3.6"

python - 如何在Python中循环显示多个散点图?

java - Web 应用程序测试中的完整字段从 Python/Eclipse/DyDev 中的映射文档调用数据

java - 如何在 Selenium 中单击带有 anchor 标记的图像或图标

java - 如何使用 java 在 selenium 中显式等待 driver.get()

javascript - Jquery 名称验证问题(仅限字符)

javascript - 如何动态地将图像放入框中?

javascript - 从异步函数中调用异步函数返回未定义

python - 运行 Paster 时出现语法错误?