我的代码:
from urllib2 import urlopen
from bs4 import BeautifulSoup
url = "https://realpython.com/practice/profiles.html"
html_page = urlopen(url)
html_text = html_page.read()
soup = BeautifulSoup(html_text)
links = soup.find_all('a', href = True)
files = []
base = "https://realpython.com/practice/"
def page_names():
for a in links:
files.append(base + a['href'])
page_names()
for i in files:
all_page = urlopen(i)
all_text = all_page.read()
all_soup = BeautifulSoup(all_text)
print all_soup
解析的前半部分收集三个链接,后半部分应该打印出它们的所有 html。
遗憾的是,它只打印最后一个链接的 html。
可能是因为
for i in files:
all_page = urlopen(i)
它以前使用 8 行代码为 for i in files: purpose 服务,但我想清理它并将其归结为这两行。嗯,显然不是,因为它不起作用。
虽然没有错误!
最佳答案
您只在循环中存储最后一个值,您需要在循环内移动所有赋值和打印:
for i in files:
all_page = urlopen(i)
all_text = all_page.read()
all_soup = BeautifulSoup(all_text)
print all_soup
如果您要使用函数,我会传递参数并创建列表,否则您可能会得到意想不到的输出:
def page_names(b,lnks):
files = []
for a in lnks:
files.append(b + a['href'])
return files
for i in page_names(base,links):
all_page = urlopen(i)
all_text = all_page.read()
all_soup = BeautifulSoup(all_text)
print all_s
然后您的函数可以返回一个列表推导式:
def page_names(b,lnks):
return [b + a['href'] for a in lnks]
关于Python:从列表中解析只打印最后一项,而不是全部?,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/29733903/