python - 如何通过python-3.6在网站html中搜索?

标签 python html search python-requests python-3.6

我有很多礼物,我需要创建检查器,它将检查礼物是否有效 --> 它将在 html 中搜索一些词。我正在寻找“礼品码无效”

当我尝试通过 urllib 或请求读取 html 时,它只会加载一小部分 html。我是初学者,所以我可能做错了什么。

我的代码是:

import requests
link = "https://discord.gift/o2uzOR7YE3CoBpGq"
r = requests.get(link)
print(r.text)

输出是:

<!DOCTYPE html>
<html>
  <head>
    <meta charset="utf-8" />
    <meta content="width=device-width, initial-scale=1.0, maximum-scale=1, user-scalable=no" name="viewport" />

    <!-- section:seometa -->
    <meta property="og:type" content="website" />
    <meta property="og:site_name" content="Discord" />
    <meta property="og:title" content="Discord - Free voice and text chat for gamers" />
    <meta
      property="og:description"
      content="Step up your game with a modern voice & text chat app. Crystal clear voice, multiple server and channel support, mobile apps, and more. Get your free server now!"
    /><meta property="og:image" content="https://discordapp.com/assets/ee7c382d9257652a88c8f7b7f22a994d.png" />    <meta name="twitter:card" content="summary_large_image" />
    <meta name="twitter:site" content="@discordapp" />
    <meta name="twitter:creator" content="@discordapp" />
    <!-- endsection -->

    <link
      rel="chrome-webstore-item"
      href="https://chrome.google.com/webstore/detail/lcbhdgefieegnkbopmgklhlpjjdgmbog"
    />
<link rel="stylesheet" href="/assets/0.830216ebaf585f92a484.css" integrity="sha256-qzZED1N67NuVMyWOdvhIGhtLtKnOXSg+F3HcanmdW4Q= sha512-D0iS5hrftKNpXWnvjpfujnvlabUq6K5gsHbsdvctRMtQXzdf2jvZ/JwaRHAPSb9Z5Xb2o8SBeXeMTajvtrkeRw=="><link rel="icon" href="/assets/07dca80a102d4149e9736d4b162cff6f.ico" />    <!-- section:title -->
    <title>Discord</title>
    <!-- endsection -->
  </head>

  <body>
    <div id="app-mount"></div><script nonce="NjksMjM0LDU4LDI4LDkxLDUxLDYzLDE3Mg==">window.__OVERLAY__ = /overlay/.test(location.pathname)</script><script nonce="NjksMjM0LDU4LDI4LDkxLDUxLDYzLDE3Mg==">window.GLOBAL_ENV = {
      API_ENDPOINT: '//discordapp.com/api',
      WEBAPP_ENDPOINT: '//discordapp.com',
      CDN_HOST: 'cdn.discordapp.com',
      ASSET_ENDPOINT: 'https://discordapp.com',
      WIDGET_ENDPOINT: '//discordapp.com/widget',
      INVITE_HOST: 'discord.gg',
      GIFT_CODE_HOST: 'discord.gift',
      MARKETING_ENDPOINT: '//discordapp.com',
      NETWORKING_ENDPOINT: '//router.discordapp.net',
      RELEASE_CHANNEL: 'stable',
      BRAINTREE_KEY: 'production_5st77rrc_49pp2rp4phym7387',
      STRIPE_KEY: 'pk_live_CUQtlpQUF0vufWpnpUmQvcdi',
    };</script><script nonce="NjksMjM0LDU4LDI4LDkxLDUxLDYzLDE3Mg==">!function(){if(null!=window.WebSocket){var n=function(n){try{var e=localStorage.getItem(n);return null==e?null:JSON.parse(e)}catch(n){return null}},e=n("token"),o=n("gatewayURL");if(e&&o){var r=null!=window.DiscordNative||null!=window.require?"etf":"json",t=o+"/?encoding="+r+"&v=6";void 0!==window.Uint8Array&&(t+="&compress=zlib-stream"),console.log("[FAST CONNECT] "+t+", encoding: "+r+", version: 6");var a=new WebSocket(t);a.binaryType="arraybuffer";var i=Date.now(),s={open:!1,gateway:t,messages:[]};a.onopen=function(){console.log("[FAST CONNECT] connected in "+(Date.now()-i)+"ms"),s.open=!0},a.onclose=a.onerror=function(){window._ws=null},a.onmessage=function(n){s.messages.push(n)},window._ws={ws:a,state:s}}}}();</script><script src="/assets/294f56f239ff22f62fc1.js" integrity="sha256-wTRQJKoqMfG3makS9dDuuegpcHSdaGmfoEBQUPXMdDM= sha512-OVrPyjx2akoJ6QS8OZ+9blz/ADtDHruxw4gwLsjfDVUgolO1ZtcgWbOo0Zj9JBNyzAjKOSCfoFoN9lnkF0EYCw=="></script><script src="/assets/eaa48b00154d2e7ac545.js" integrity="sha256-FRTrm1gL5gkDUoKwVuL9hrrmllKXQsZg7r5zy0Xo4bo= sha512-QZ4c5JQKE5rLJf1uGLQaHHL4NpkAigt4TtluicuMZDYDE5fiL7wkaD2CMBxr0xhOO5aNfSFCxcaqBkU/xOEggQ=="></script><script src="/assets/c73d229b094bb39f0686.js" integrity="sha256-thaBLLvK6Up+B8O7zIOF9Uv8IF+gwGuOW+WUe26l/vk= sha512-5ez2fLO3oMI1UPZDif1Szfjwz04ftTNfhWWSqM81hNhuVN7kckAAZR5a1SuQG8rgsqXwN1is53uAL5M2rz/FOg=="></script>  </body>
</html>

你可以在第一张图片中看到,该站点的 html 中有文本“gift code invalid”,但此字符串不在 python 输出中。

https://ctrlv.cz/kKd3

最佳答案

你找的“礼包码无效”可能是js渲染的。 requests 不呈现 js 输出,这就是你找不到它的原因。

如果您使用的是 Python 3.6,请尝试使用 requests-html 来呈现带有 js 输出的网页。

更新示例:

from requests_html import HTMLSession

link = 'https://discord.gift/o2uzOR7YE3CoBpGq'
targetString = "Gift Code Invalid"
session = HTMLSession()
r = session.get(link)
print("Before render is call: ", targetString in r.html.text)
# sleep has to be implemented after initial the render to get the proper response
r.html.render(wait=2, sleep=1)
print("After render is call: ", targetString in r.html.text)

输出:

Before render is call:  False
After render is call:  True
Process finished with exit code 0

您可以访问库的文档了解不同的方法,例如按元素查找,甚至在渲染后将响应转换为 lxml 对象: https://html.python-requests.org/

关于python - 如何通过python-3.6在网站html中搜索?,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/54636399/

相关文章:

python - 外部命令 C 和 Python

html - 如何避免将类应用于 Div 内的跨度?

HTML5/CSS3 如何强制文本需要一定的宽度才能渲染

ruby-on-rails - 搜索查询中指定时区

Facebook FQL public post 按关键字搜索

python - 在R,G,B平面中有效替换像素值python-openCV

python - 如何从 Instagram 网络浏览器中抓取关注者?

python - Django:当管理模板已经被覆盖时不能覆盖它们吗?

jquery - 如何在 Bootstrap slider 中显示轮播指示器内的图像和文本?

使用 jQuery 的 Javascript 高效搜索数组值