假设,我想从 Web 获取一个页面到我的应用程序并对其进行某种解析。我怎么做?我应该从哪里开始?应该需要一些插件/ gem 吗?您在解决此类任务时通常的做法是什么?
最佳答案
你应该尝试像 Hpricot 这样的 Gems ( wiki ) 或 Nokogiri .
杏示例:
require 'open-uri'
require 'rubygems'
require 'hpricot'
html = Hpricot(open(an_url).read)
# This would search for any images inside a paragraph (XPath)
html.search('/html/body//p//img')
# This would search for any images with the class "test" (CSS selector)
html.search('img.test')
Nokogiri 示例:
require 'open-uri'
require 'rubygems'
require 'hpricot'
html = Nokogiri::HTML(open(an_url).read)
# This would search for any images inside a paragraph (XPath)
html.xpath('/html/body//p//img')
# This would search for any images with the class "test" (CSS selector)
html.css('img.test')
Nokogiri通常更快。这两个库都具有很多功能。
关于ruby-on-rails - 加载用于在 Rails 中解析的网页,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/1469833/