我正在尝试使用以下代码使用 BeautifulSoup 抓取网页:
import urllib.request
from bs4 import BeautifulSoup
with urllib.request.urlopen("http://en.wikipedia.org//wiki//Markov_chain.htm") as url:
s = url.read()
soup = BeautifulSoup(s)
with open("scraped.txt", "w", encoding="utf-8") as f:
f.write(soup.get_text())
f.close()
问题是它保存了 Wikipedia's main page而不是那篇特定的文章。为什么该地址不起作用?我应该如何更改它?
最佳答案
该页面的正确网址是 http://en.wikipedia.org/wiki/Markov_chain :
>>> import urllib.request
>>> from bs4 import BeautifulSoup
>>> url = "http://en.wikipedia.org/wiki/Markov_chain"
>>> soup = BeautifulSoup(urllib.request.urlopen(url))
>>> soup.title
<title>Markov chain - Wikipedia, the free encyclopedia</title>
关于python - 使用 BeautifulSoup 保存网页内容,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/25256890/