Python - 使用 BeautifulSoup 和 Urllib 进行抓取

标签 python python-3.x beautifulsoup urllib

我正在尝试阅读网站,但不幸的是出了点问题。

import bs4 as bs
import urllib.request

sauce = urllib.request.urlopen('https://csgoempire.com/withdraw').read()
soup = bs.BeautifulSoup(sauce,'lxml')

print(soup.find_all('p'))

错误:

Traceback (most recent call last):
  File "F:/Informatika/Python3X/GamblinSitesBot/GamblingSitesBot.py", line 4, in <module>
    sauce = urllib.request.urlopen('https://csgoempire.com/').read()
  File "c:\users\edgaras\appdata\local\programs\python\python36\Lib\urllib\request.py", line 223, in urlopen
    return opener.open(url, data, timeout)
  File "c:\users\edgaras\appdata\local\programs\python\python36\Lib\urllib\request.py", line 532, in open
    response = meth(req, response)
  File "c:\users\edgaras\appdata\local\programs\python\python36\Lib\urllib\request.py", line 642, in http_response
    'http', request, response, code, msg, hdrs)
  File "c:\users\edgaras\appdata\local\programs\python\python36\Lib\urllib\request.py", line 570, in error
    return self._call_chain(*args)
  File "c:\users\edgaras\appdata\local\programs\python\python36\Lib\urllib\request.py", line 504, in _call_chain
    result = func(*args)
  File "c:\users\edgaras\appdata\local\programs\python\python36\Lib\urllib\request.py", line 650, in http_error_default
    raise HTTPError(req.full_url, code, msg, hdrs, fp)
urllib.error.HTTPError: HTTP Error 403: Forbidden

Process finished with exit code 1

此外,此代码适用于其他网站,例如 google.com

最佳答案

您可以使用请求库实现相同的目的。这很好用

import bs4 as bs
import requests

sauce = requests.get('https://csgoempire.com/withdraw')
soup = bs.BeautifulSoup(sauce.content,'html.parser')
print(soup.find_all('p'))

关于Python - 使用 BeautifulSoup 和 Urllib 进行抓取,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/49796735/

相关文章:

python - 访问列表/从列表传递到类的数据

javascript - 外部 javascript/jquery 未加载到 django 项目内部

python - 在 Python 中从字符串中提取数字

python - 如何在Python中将变量限制为零

python - 使用 Python 从字符串中提取链接

Python 二进制搜索(最大迭代次数)

python - 将 SIGINT 信号委托(delegate)给子进程,然后清理并终止父进程

python - 上传大文件不起作用 - Google Drive Python API

python - BeautifulSoup 网络抓取 find_all() : finding exact match

python - 使用 Selenium 和 BeautifulSoup 的输入来抓取网站?