ruby-on-rails - 使用 Sanitize 将转换器中的节点列入白名单

标签 ruby-on-rails ruby nokogiri sanitize

我在使用 this example 时遇到了一些问题使用 Ruby 的 Sanitize 库创建转换器 lambda。

我已经完成并拼凑了一个简单的脚本,该脚本试图清理我的 options[:content] 变量中的任何内容,但尽管遇到了包含称为节点数组的哈希的位:返回了 node_whitelist,似乎我的节点没有进入白名单。

这是我的代码:

#!/usr/bin/ruby

require 'rubygems'
require 'sanitize'

options = { :content => "<p>Here is my content. It has a video: <object width='480' height='390'><param name='movie' value='http://www.youtube.com/v/wjthx1GKhUI?fs=1&amp;hl=en_US'></param><param name='allowFullScreen' value='true'></param><param name='allowscriptaccess' value='always'></param><embed src='http://www.youtube.com/v/wjthx1GKhUI?fs=1&amp;hl=en_US' type='application/x-shockwave-flash' allowscriptaccess='always' allowfullscreen='true' width='480' height='390'></embed></object></p>" }

# adapted from example at https://github.com/rgrove/sanitize/
video_embed_sanitizer = lambda do |env|
  node      = env[:node]
  node_name = env[:node_name]

  puts "[video_embed_sanitizer] Starting up"
  puts "[video_embed_sanitizer]   node is #{node}"
  puts "[video_embed_sanitizer]   node.name.to_s.downcase is #{node.name.to_s.downcase}"

  # Don't continue if this node is already whitelisted or is not an element.
  if env[:is_whitelisted] then
    puts "[video_embed_sanitizer]   Already whitelisted"
  end
  return nil if env[:is_whitelisted] || !node.element?

  parent = node.parent

  # Since the transformer receives the deepest nodes first, we look for a
  # <param> element or an <embed> element whose parent is an <object>.
  return nil unless (node.name.to_s.downcase == 'param' || node.name.to_s.downcase == 'embed') &&
    parent.name.to_s.downcase == 'object'

  if node.name.to_s.downcase == 'param'
    # Quick XPath search to find the <param> node that contains the video URL.
    return nil unless movie_node = parent.search('param[@name="movie"]')[0]
    url = movie_node['value']
  else
    # Since this is an <embed>, the video URL is in the "src" attribute. No
    # extra work needed.
    url = node['src']
  end

  # Verify that the video URL is actually a valid YouTube video URL.
  puts "[video_embed_sanitizer]   URL is #{url}"
  return nil unless url =~ /^http:\/\/(?:www\.)?youtube\.com\/v\//

  # We're now certain that this is a YouTube embed, but we still need to run
  # it through a special Sanitize step to ensure that no unwanted elements or
  # attributes that don't belong in a YouTube embed can sneak in.
  puts "[video_embed_sanitizer]   Node before cleaning is #{node}"
  Sanitize.clean_node!(parent, {
    :elements => %w[embed object param],

    :attributes => {
      'embed'  => %w[allowfullscreen allowscriptaccess height src type width],
      'object' => %w[height width],
      'param'  => %w[name value]
    }
  })
  puts "[video_embed_sanitizer]   Node after cleaning is #{node}"

  # Now that we're sure that this is a valid YouTube embed and that there are
  # no unwanted elements or attributes hidden inside it, we can tell Sanitize
  # to whitelist the current node (<param> or <embed>) and its parent
  # (<object>).
  puts "[video_embed_sanitizer]   Marking node as whitelisted and returning"
  {:node_whitelist => [node, parent]}
end

options[:content] = Sanitize.clean(options[:content], :elements => ['a', 'b', 'blockquote', 'br', 'em', 'i', 'img', 'li', 'ol', 'p', 'span', 'strong', 'ul'],
                                    :attributes => {'a' => ['href', 'title'], 'span' => ['class', 'style'], 'img' => ['src', 'alt']},
                                    :protocols => {'a' => {'href' => ['http', 'https', :relative]}},
                                    :add_attributes => { 'a' => {'rel' => 'nofollow'}},
                                    :transformers => [video_embed_sanitizer])
puts options[:content]

这是正在生成的输出:

[video_embed_sanitizer] Starting up
[video_embed_sanitizer]   node is <param name="movie" value="http://www.youtube.com/v/wjthx1GKhUI?fs=1&amp;hl=en_US">
[video_embed_sanitizer]   node.name.to_s.downcase is param
[video_embed_sanitizer]   URL is http://www.youtube.com/v/wjthx1GKhUI?fs=1&hl=en_US
[video_embed_sanitizer]   Node before cleaning is <param name="movie" value="http://www.youtube.com/v/wjthx1GKhUI?fs=1&amp;hl=en_US">
[video_embed_sanitizer]   Node after cleaning is <param name="movie" value="http://www.youtube.com/v/wjthx1GKhUI?fs=1&amp;hl=en_US">
[video_embed_sanitizer]   Marking node as whitelisted and returning
[video_embed_sanitizer] Starting up
[video_embed_sanitizer]   node is <param name="allowFullScreen" value="true">
[video_embed_sanitizer]   node.name.to_s.downcase is param
[video_embed_sanitizer] Starting up
[video_embed_sanitizer]   node is <param name="allowscriptaccess" value="always">
[video_embed_sanitizer]   node.name.to_s.downcase is param
[video_embed_sanitizer] Starting up
[video_embed_sanitizer]   node is <embed src="http://www.youtube.com/v/wjthx1GKhUI?fs=1&amp;hl=en_US" type="application/x-shockwave-flash" allowscriptaccess="always" allowfullscreen="true" width="480" height="390"></embed>
[video_embed_sanitizer]   node.name.to_s.downcase is embed
[video_embed_sanitizer]   URL is http://www.youtube.com/v/wjthx1GKhUI?fs=1&hl=en_US
[video_embed_sanitizer]   Node before cleaning is <embed src="http://www.youtube.com/v/wjthx1GKhUI?fs=1&amp;hl=en_US" type="application/x-shockwave-flash" allowscriptaccess="always" allowfullscreen="true" width="480" height="390"></embed>
[video_embed_sanitizer]   Node after cleaning is <embed src="http://www.youtube.com/v/wjthx1GKhUI?fs=1&amp;hl=en_US" type="application/x-shockwave-flash" allowscriptaccess="always" allowfullscreen="true" width="480" height="390"></embed>
[video_embed_sanitizer]   Marking node as whitelisted and returning
[video_embed_sanitizer] Starting up
[video_embed_sanitizer]   node is <object width="480" height="390"></object>
[video_embed_sanitizer]   node.name.to_s.downcase is object
[video_embed_sanitizer] Starting up
[video_embed_sanitizer]   node is <p>Here is my content. It has a video: </p>
[video_embed_sanitizer]   node.name.to_s.downcase is p
<p>Here is my content. It has a video: </p>

我做错了什么?

最佳答案

我也遇到了 YouTube 示例的问题。以下是我如何允许使用脚本标签,但仅限于 Ooyala 视频播放器:

  1. 将“脚本”添加到 :elements
  2. 将 'script' => ['src'] 添加到 :attributes
  3. 使用 :transformers => lambda { |env|接下来除非 env[:node_name] == 'script';除非 (env[:node]['src'] && env[:node]['src'].include?('http://player.ooyala.com')); Sanitize.clean_node!(env[:node], {});结尾;无

我还通过创建自己的初始化配置来彻底清理:

class Sanitize
  module Config
    ULTRARELAXED = {
      :elements => [
        'a', 'b', 'blockquote', 'br', 'caption', 'cite', 'code', 'col',
        'colgroup', 'dd', 'dl', 'dt', 'em', 'h1', 'h2', 'h3', 'h4', 'h5', 'h6',
        'i', 'img', 'li', 'ol', 'p', 'pre', 'q', 'small', 'strike', 'strong',
        'sub', 'sup', 'table', 'tbody', 'td', 'tfoot', 'th', 'thead', 'tr', 'u',
        'ul', 'object', 'embed', 'param', 'iframe', 'script'],

      :attributes => {
        'a'          => ['href', 'title'],
        'blockquote' => ['cite'],
        'col'        => ['span', 'width'],
        'colgroup'   => ['span', 'width'],
        'img'        => ['align', 'alt', 'height', 'src', 'title', 'width'],
        'ol'         => ['start', 'type'],
        'q'          => ['cite'],
        'table'      => ['summary', 'width'],
        'td'         => ['abbr', 'axis', 'colspan', 'rowspan', 'width'],
        'th'         => ['abbr', 'axis', 'colspan', 'rowspan', 'scope',
                         'width'],
        'ul'         => ['type'],
        'object' => ['width', 'height'],
        'param'  => ['name', 'value'],
        'embed'  => ['src', 'type', 'allowscriptaccess', 'allowfullscreen', 'width', 'height', 'flashvars'],
        'iframe' => ['src', 'width', 'height', 'frameborder'],
        'script' => ['src']
      },

      :protocols => {
        'a'          => {'href' => ['ftp', 'http', 'https', 'mailto', :relative]},
        'blockquote' => {'cite' => ['http', 'https', :relative]},
        'img'        => {'src'  => ['http', 'https', :relative]},
        'q'          => {'cite' => ['http', 'https', :relative]}
      },

      :transformers => lambda { |env| next unless env[:node_name] == 'script'; unless (env[:node]['src'] && env[:node]['src'].include?('http://player.ooyala.com')); Sanitize.clean_node!(env[:node], {}); end; nil }
    }
  end
end

Sanitize.clean(html, Sanitize::Config::ULTRARELAXED)

关于ruby-on-rails - 使用 Sanitize 将转换器中的节点列入白名单,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/5504200/

相关文章:

ruby-on-rails - Rails - 扩展 Ruby Gem

ruby-on-rails - VMPlayer 中 Ubuntu 64 上的 Ruby on Rails 3 教程

ruby-on-rails - 设计如何更改 reset_password_token 错误

ruby-on-rails - "rake db:migrate"结果为 "The bundle currently has pg locked at 0.18.4"

ruby-on-rails - 当以不同的方式编写时,范围解析在 ruby​​ 中的工作方式不同

ruby - 除了实际元素之外,Nokogiri XML.children 还返回格式化元素。如何避免这种情况?

html - 带复选框的简单表单内联输入

c - Ruby 无法加载 DLL

ruby - Nokogiri 识别页面上最大文本的策略?

xhtml - Nokogiri 替换标签值