python - 使用scrapy、python提取url的标题

我是 Scrapy、Python 的新手。我需要提取网址的标题而不是上下文。下面的代码提取内容和标题。请对上述内容提供帮助

提前谢谢您。

class BlogSpider(scrapy.Spider):
         name = 'bg'
         start_urls = ['https://blog.scrapinghub.com', 'https://scrapinghub.com/']

     def parse(self, response):
        for title in response.css('h2.entry-title'):
            yield {'title': title.css('a ::text').extract_first()}

        page = response.url.split("/")[-2]
        filename = 'urltitle-%s.html' %page
        with open(filename,'wb') as f:
           f.write(response.body)

最佳答案

不确定我是否正确理解“标题”的含义，但如果您需要提取标签 a 的 title 属性，您可以使用适当的选择器提取它title.css('a::attr(title)')

关于python - 使用scrapy、python提取url的标题，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/42080992/

上一篇：python - 质数查找器，包括 2 多次

下一篇：python - 与NLTK库相关的一段Python代码在不同计算机上的不同结果

相关文章：

python - Scrapy request.meta 更新不正确

asp.net - 抓取包含隐藏 URL 的 _dopostback 方法的网站

python - 为什么sfml无法在此python代码中的函数内播放文件？

python - Django Celery Scrappy 错误 : twisted. internet.error.ReactorNotRestartable

python - sibling 困惑后的 Scrapy？

python - Scrapy爬行蜘蛛只触摸start_urls

python - 在给定索引处分隔列表

python - Python 中 switch 语句的替代品？

python - 如果 HDF5 组/表不存在，则创建它

python - Pandas :用相同重复名称/键组的第一个值填充空值