Python 3 网络抓取工具非常简单，无法正常工作

我正在阅读一本名为“自学成才的程序员”的书，但在使用一些 Python 代码时遇到了问题。我让程序运行没有任何错误。问题是没有任何输出。

import urllib.request
from bs4 import BeautifulSoup


class Scraper:
    def __init__(self, site):
        self.site = site

    def scrape(self):
        r = urllib.request\
            .urlopen(self.site)
        html = r.read()
        parser = "html.parser"
        sp = BeautifulSoup(html, parser)
        for tag in sp.find_all("a"):
            url = tag.get("href")
            if url is None:
                continue
            if "html" in url:
                print("\n" + url)

news = "https://news.google.com/"
Scraper(news).scrape()

最佳答案

查看最后一个“if”语句。如果 url 中没有文本“html”，则不会打印任何内容。尝试删除它并取消缩进:

class Scraper:
    def __init__(self, site):
        self.site = site

    def scrape(self):
        r = urllib.request\
            .urlopen(self.site)
        html = r.read()
        parser = "html.parser"
        sp = BeautifulSoup(html, parser)
        for tag in sp.find_all("a"):
            url = tag.get("href")
            if url is None:
                continue
            print("\n" + url)

关于Python 3 网络抓取工具非常简单，无法正常工作，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/52249132/

上一篇：python - 在使用 gensim 库进行训练时，Skip-gram word2vec 和 CBOW w2v 有什么区别？

下一篇：python - 计算样本的标准偏差

相关文章：

rest - 如何让客户端接受服务器的SSL证书

python - 如何测试网页是否为图片

python - 将文件 IO 从 urllib 重写为 csvreader

python - 如何在 puppeteer 中传递实验性 chrome 选项

python - 如何更新 matplotlib 中的绘图

python - eventlet 线程不打印输出

python - 尝试使用 BeautifulSoup 从没有 API 的站点获取数据

html - BeautifulSoup findall 与名称列表没有找到另一个目标之后的目标

python - 查找与集合中所有向量的距离大致相等的向量

python - 属性错误 : addinfourl instance has no attribute 'get_type'