我在迭代时遇到 IndexError
问题。该程序运行良好,直到一切完成,不再有“子网站”可访问,然后崩溃,因此无法保存在 .txt 中。
回溯(最近一次调用)
newUrl = nextpage[counter]['href']
IndexError: list index out of range
代码
from urllib.request import urlopen, Request
from bs4 import BeautifulSoup
import json
class Olx():
def __init__(self, url):
self.url = url
def getPrice(self):
"""Get prices from olx"""
html = urlopen(self.url)
bs = BeautifulSoup(html, 'html.parser')
price = bs.findAll('p', class_='price')
return price
def nextPage(self):
"""Go to the next page"""
html = urlopen(self.url)
bs = BeautifulSoup(html, 'html.parser')
pageButton = bs.findAll('a', {'class': 'block br3 brc8 large tdnone lheight24'})
try:
return pageButton
except AttributeError:
None
else:
return pageButton
olxprices = Olx('https://www.olx.pl/nieruchomosci/mieszkania/wynajem/olsztyn/').getPrice()
nextpage = Olx('https://www.olx.pl/nieruchomosci/mieszkania/wynajem/olsztyn/').nextPage()
counter = 0
output = []
while len(nextpage) > 0:
for price in olxprices:
output.append(price.get_text().strip())
print(price.get_text().strip())
newUrl = nextpage[counter]['href']
olxprices = Olx(newUrl).getPrice()
counter += 1
print(output)
最佳答案
len(nextpage)
永远不会改变,因此 while 循环永远不会结束,最终 counter
索引会超过 nextpage
的末尾。相反,请执行以下操作:
for page in nextpage:
for price in olxprices:
output.append(price.get_text().strip())
print(price.get_text().strip())
newUrl = page['href']
olxprices = Olx(newUrl).getPrice()
关于python - 迭代时出现索引错误,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/66663535/