我是 Ruby 的新手,正在使用 Nokogiri 来解析 html 网页。当函数执行到以下行时会抛出一个错误:
currentPage = Nokogiri::HTML(open(url))
我已经验证了函数的输入,url 是一个带有网址的字符串。我之前提到的那行在函数外部使用时完全按照预期工作,但在函数内部则不然。当它到达函数内部的那一行时,会抛出以下错误:
WebCrawler.rb:25:in `explore': undefined method `+@' for #<Nokogiri::HTML::Document:0x007f97ea0cdf30> (NoMethodError)
from WebCrawler.rb:43:in `<main>'
下面粘贴了有问题的行所在的函数。
def explore(url)
if CRAWLED_PAGES_COUNTER > CRAWLED_PAGES_LIMIT
return
end
CRAWLED_PAGES_COUNTER++
currentPage = Nokogiri::HTML(open(url))
links = currentPage.xpath('//@href').map(&:value)
eval_page(currentPage)
links.each do|link|
puts link
explore(link)
end
end
这是完整的程序(不会太长):
require 'nokogiri'
require 'open-uri'
#Crawler Params
START_URL = "https://en.wikipedia.org"
CRAWLED_PAGES_COUNTER = 0
CRAWLED_PAGES_LIMIT = 5
#Crawler Functions
def explore(url)
if CRAWLED_PAGES_COUNTER > CRAWLED_PAGES_LIMIT
return
end
CRAWLED_PAGES_COUNTER++
currentPage = Nokogiri::HTML(open(url))
links = currentPage.xpath('//@href').map(&:value)
eval_page(currentPage)
links.each do|link|
puts link
explore(link)
end
end
def eval_page(page)
puts page.title
end
#Start Crawling
explore(START_URL)
最佳答案
require 'nokogiri'
require 'open-uri'
#Crawler Params
$START_URL = "https://en.wikipedia.org"
$CRAWLED_PAGES_COUNTER = 0
$CRAWLED_PAGES_LIMIT = 5
#Crawler Functions
def explore(url)
if $CRAWLED_PAGES_COUNTER > $CRAWLED_PAGES_LIMIT
return
end
$CRAWLED_PAGES_COUNTER+=1
currentPage = Nokogiri::HTML(open(url))
links = currentPage.xpath('//@href').map(&:value)
eval_page(currentPage)
links.each do|link|
puts link
explore(link)
end
end
def eval_page(page)
puts page.title
end
#Start Crawling
explore($START_URL)
关于ruby - Nokogiri 在函数中抛出异常但不在函数外,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/42633731/