python - 使用请求抓取网页不会返回所有数据

标签 python html web-scraping beautifulsoup python-requests

我正在使用 python 请求包来抓取网页。这是代码:

import requests
from bs4 import BeautifulSoup

# Configure Settings
url = "https://mangaabyss.com/read/"
comic = "the-god-of-pro-wrestling"

# Run Scraper
page = requests.get(url + comic + "/")

soup = BeautifulSoup(page.content, 'html.parser')

print(soup.prettify())

它使用的网址是“https://mangaabyss.com/read/the-god-of-pro-wreSTLing/” 但是在 soup 的输出中,我只得到第一个 div 而没有其中的其他子元素。 这是我得到的输出:

<!DOCTYPE html>
<html lang="en">
 <head>
  <meta charset="utf-8"/>
  <link href="/favicon.ico" rel="icon"/>
  <meta content="width=device-width,initial-scale=1,minimum-scale=1,maximum-scale=1,viewport-fit=cover" name="viewport"/>
  <meta content="#250339" name="theme-color"/>
  <title>
   MANGA ABYSS
  </title>
  <script crossorigin="" src="/assets/index.f4dc01fb.js" type="module">
  </script>
  <link href="/assets/index.9b4eb8b4.css" rel="stylesheet"/>
 </head>
 <body>
  <div id="manga-mobile-app">
  </div>
 </body>
</html>

我想抓取的内容在那个 div 的深处 我正在寻找提取章节的数量。 这是它的选择器:

#manga-mobile-app > div > div.comic-info-component > div.page-normal.with-margin > div.comic-deatil-box.tab-content.a-move-in-right > div.comic-episodes > div.episode-header.f-clear > div.f-left > span

谁能帮我解决我哪里出错了?

最佳答案

数据是从外部 URL 加载的,所以 beautifulsoup 看不到它。您可以使用 requests 模块来模拟此调用:

import json
import requests

slug = "the-god-of-pro-wrestling"
url = "https://mangaabyss.com/circinus/Manga.Abyss.v1/ComicDetail?slug="

data = requests.get(url + slug).json()

# uncomment to print all data:
# print(json.dumps(data, indent=4))

for ch in data["data"]["chapters"]:
    print(
        ch["chapter_name"],
        "https://mangaabyss.com/read/{}/{}".format(slug, ch["chapter_slug"]),
    )

打印:

...

Chapter 4 https://mangaabyss.com/read/the-god-of-pro-wrestling/chapter-4
Chapter 3 https://mangaabyss.com/read/the-god-of-pro-wrestling/chapter-3
Chapter 2 https://mangaabyss.com/read/the-god-of-pro-wrestling/chapter-2
Chapter 1 https://mangaabyss.com/read/the-god-of-pro-wrestling/chapter-1

关于python - 使用请求抓取网页不会返回所有数据,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/73097598/

相关文章:

python - 远程触发不同网络上的计算机

python - 自定义验证器中的 XML ParseError : junk after document element: line 1, 第 11 列 (Wagtail)

python - 解析雅虎财经 python httperror 502

javascript - 滚动更改菜单的当前元素

html - 如何将预定义颜色添加到 &lt;input type ="color">?

python - 将字典保存到 json python

javascript - 如何根据背景颜色设置黑白之间的字体颜色

web-scraping - 从Python脚本调用Scrapy Spider?

python - BeautifulSoup Steam 市场网页抓取错误

node.js - 使用 Heroku 作为代理好吗?