我正在使用 Mechanize 创建一个抓取器,它通过一个 csv url 运行并下载图像。
问题是一些图像不再存在,我抛出了未找到的 404 错误。我是 Ruby 的新手,不知道如何处理异常,希望有人能帮助我。
我放弃我想做的事
agent = Mechanize.new
url = CSV.read("links.csv")
begin
url.each do |url|
puts url
agent.get(url.first).save
end
rescue Net::HTTPNotFound => e
puts e.response_code
agent = e.agent
end
它给我的错误是:
/home/miguel/.rbenv/versions/2.4.2/lib/ruby/gems/2.4.0/gems/mechanize-2.7.5/lib/mechanize/http/agent.rb:323:in `fetch': 404 => Net::HTTPNotFound for http://www.rockauto.com/info/915/FCA6366_Fronp__ra_p.jpg -- unhandled response (Mechanize::ResponseCodeError)
from descargaimagenes.rb:34:in `fetch_with_retry'
from /home/miguel/.rbenv/versions/2.4.2/lib/ruby/gems/2.4.0/gems/mechanize-2.7.5/lib/mechanize.rb:464:in `get'
/home/miguel/.rbenv/versions/2.4.2/lib/ruby/gems/2.4.0/gems/mechanize-2.7.5/lib/mechanize/http/agent.rb:323:in `fetch': 404 => Net::HTTPNotFound for http://www.rockauto.com/info/915/FCA6366_Fronp__ra_p.jpg -- unhandled response (Mechanize::ResponseCodeError)
from descargaimagenes.rb:34:in `fetch_with_retry'
from /home/miguel/.rbenv/versions/2.4.2/lib/ruby/gems/2.4.0/gems/mechanize-2.7.5/lib/mechanize.rb:464:in `get'
最佳答案
您可以使用 Mechanize::ResponseCodeError
异常(exception):
This error is raised when Mechanize encounters a response code it does not know how to handle. Currently, this exception will be thrown if Mechanize encounters response codes other than 200, 301, or 302. Any other response code is up to the user to handle.
并将救援移动到每个块中,这样您就可以转到 url,保存图像,如果找不到资源,则打印响应代码。
[
'http://www.rockauto.com/Images/whatsnew1.jpg?1512928800',
'http://www.rockauto.com/info/915/FCA6366_Fronp__ra_p.jpg',
'http://www.rockauto.com/Images/whatsnew2.jpg?1512928800'
].each do |url|
begin
agent.get(url).save
rescue Mechanize::ResponseCodeError => e
puts e.response_code
end
end
有两个有效的 url,中间的一个无效,您应该获得与每个有效 url 对应的两个图像。
关于ruby - Mechanize HTTP 未找到 404 链接,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/47742140/