python - 优化网页抓取的多次尝试除外代码

我有一个网页抓取脚本，由于多种原因，如果没有找到预期的信息，某些代码会“中断”。我正在使用多个“try/except” block 来处理它。

        asin = item.get('data-asin')
        title = item.find_all('span',{'class' : 'a-size-base-plus a-color-base a-text-normal'})[0].get_text()   

        try: 
            label = item.find_all('span',{'aria-label' : 'Escolha da Amazon'})[0].get('aria-label')
        except IndexError :
            label = None
            
        try:
            current_whole_price = item.find_all('span', {'class' : 'a-price'})[0].find('span', {'class' : 'a-price-whole'}).get_text().replace(',','').replace('.','')
        except:
            current_whole_price = '0'
        
        try : 
            current_fraction_price = item.find_all('span', {'class' : 'a-price'})[0].find('span', {'class' : 'a-price-fraction'}).get_text() 
        except : 
            current_fraction_price = '0'
        current_price = float(current_whole_price+'.'+current_fraction_price)

        try : 
            rating_info = item.find('div', {'class':'a-row a-size-small'}).get_text()
            rating = float(rating_info[:3].replace(',','.'))
            rating_count = int(re.findall(r'\d+', rating_info)[-1])
        except : 
            rating =  None
            rating_count = None

        try:
            ad = True if (item.find_all('span', {'class' : 'a-color-secondary'})[0].get_text() == 'Patrocinado') else False
        except IndexError:
            ad = False
        
        _ = {'productId' : itemId,
            'asin' : asin,
            'opt_label' : label,
            #"ad": True if (item.find_all('span', {'class' : 'a-color-secondary'})[0].get_text() == 'Patrocinado') else False ,
            "ad": ad,
            'title' : title,
            'current_price' : current_price,
            'url':f'https://www.amazon.com.br/dp/{asin}',
            'rating' : rating,
            'rating_count' : rating_count,
            }

但是，看看我的代码，您可以看到许多“try/except”都是相似的。我想知道是否可以使用某种函数，在其中传递“项目”、“所需选择器”和“故障安全值”，以便在出错时返回。

我打算让我的代码更简单、更小。我接受任何提示!

问候!

最佳答案

是的，您可以创建一个函数来处理重复的 try/except block ，但是您必须找到一种通用的方法来获取字段，那么它可能类似于:

def get_element(item, selector, attribute, failsafe_value=None):
    try:
        element = item.find(selector).get(attribute)
        return element if element else failsafe_value
    except (AttributeError, IndexError):
        return failsafe_value

关于python - 优化网页抓取的多次尝试除外代码，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/77720049/

python - 优化网页抓取的多次尝试除外代码

上一篇：html - 如何在我的网站上制作精美的 Angular 落

下一篇：matlab - 如何使用行和列上的 1D 卷积来计算 2D 卷积？