python - 我想抓取多个页面,但我得到了最后一个 url 的结果。为什么?

标签 python loops for-loop url web-scraping

为什么结果输出的是最后一个url? 我的代码有问题吗?

import requests as uReq
from bs4 import BeautifulSoup as soup
import numpy as np

#can i use while loop instead for?
for page in np.arange(1,15):
    url = uReq.get('https://www.myanmarbusiness-directory.com/en/categories-index/car-wheels-tyres-tubes-dealers/page{}.html?city=%E1%80%99%E1%80%9B%E1%80%99%E1%80%B9%E1%80%B8%E1%80%80%E1%80%AF%E1%80%94%E1%80%B9%E1%80%B8%E1%81%BF%E1%80%99%E1%80%AD%E1%80%B3%E1%82%95%E1%80%94%E1%80%9A%E1%80%B9'.format(page)).text 

#have used for loop,but result is the last url
page_soup = soup(url,"html.parser")
info = page_soup.findAll("div",{"class: ","row detail_row"})

#Do all the url return output in one file?
filename = "wheel.csv"
file = open(filename,"w",encoding="utf-8")

最佳答案

您应该检查 for 循环之后发生的事情的缩进,否则,变量 url 会在循环的每次迭代中被替换,因此只保留最后一个。

import requests as uReq
from bs4 import BeautifulSoup as soup
import numpy as np

for page in np.arange(1,15):
    url = uReq.get('https://www.myanmarbusiness-directory.com/en/categories-index/car-wheels-tyres-tubes-dealers/page{}.html?city=%E1%80%99%E1%80%9B%E1%80%99%E1%80%B9%E1%80%B8%E1%80%80%E1%80%AF%E1%80%94%E1%80%B9%E1%80%B8%E1%81%BF%E1%80%99%E1%80%AD%E1%80%B3%E1%82%95%E1%80%94%E1%80%9A%E1%80%B9'.format(page)).text 

    # this should be done N times (where N is the range param)
    page_soup = soup(url,"html.parser")
    info = page_soup.findAll("div",{"class: ","row detail_row"})

    # append the results to the csv file
    filename = "wheel.csv"
    file = open(filename,"a",encoding="utf-8")
    ...  # code for writing in the csv file
    file.close()

然后,您将在文件中找到所有内容。请注意,您还应该关闭文件以进行保存。

关于python - 我想抓取多个页面,但我得到了最后一个 url 的结果。为什么?,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/66596538/

相关文章:

python - 修改 python 守护进程脚本,停止不起作用

python - "Vanilla"网络 python

Javascript 迭代器不起作用,但硬编码数字可以

python - for循环: iterating over one value of a list at a time in Python

python - 快速删除包含其他列表元组的元组

C:打印for循环

php - 来自 Programming Collective Intelligence 一书第 4 章,此查询的 MySQL 等价物是什么?

python - 如何使用 matplotlib Python 绘制时间序列

Python - 迭代并提取字典类型列表的元素

c++ - 来自另一个 bool 函数的一个字符