python - 使用 BeautifulSoup 保存网页内容

标签 python python-3.x web-scraping beautifulsoup

我正在尝试使用以下代码使用 BeautifulSoup 抓取网页:

import urllib.request
from bs4 import BeautifulSoup

with urllib.request.urlopen("http://en.wikipedia.org//wiki//Markov_chain.htm") as url:
    s = url.read()

soup = BeautifulSoup(s)

with open("scraped.txt", "w", encoding="utf-8") as f:
    f.write(soup.get_text())
    f.close()

问题是它保存了 Wikipedia's main page而不是那篇特定的文章。为什么该地址不起作用？我应该如何更改它？

最佳答案

该页面的正确网址是 http://en.wikipedia.org/wiki/Markov_chain :

>>> import urllib.request
>>> from bs4 import BeautifulSoup
>>> url = "http://en.wikipedia.org/wiki/Markov_chain"
>>> soup = BeautifulSoup(urllib.request.urlopen(url))
>>> soup.title
<title>Markov chain - Wikipedia, the free encyclopedia</title>

关于python - 使用 BeautifulSoup 保存网页内容，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/25256890/

上一篇：python - 如何将utf-8字符( '\xe7\x8e\xa9')作为汉字写入另一个文件？

下一篇：python - 将包含\x01 字符的字符串保存到磁盘

python - 如何从列表中删除重复键

python - 发送到python3中程序的标准输入

python - 使用 python3.6 抓取网站。我无法进入登录页面

python - 连接比 SocketServer.TCPServer 长？

python - 如何列出模块所依赖的用户创建的 python 文件？

c++ - 如何生成仅包含 0's and 9' 的数字

python - 在html页面中使用re模块搜索字符串

javascript - 使用 XMLHttpRequest 中的 jQuery 结果获取外部 URL 的元数据

python - 使用 2d numpy 数组切片 3d numpy 数组