javascript - 从检索到的页面的 JavaScript 中解析数据

我正在使用 OpenURI 检索网页:

require 'open-uri'
page = open('http://www.example.com').read.scrub

现在我想解析属性 playerurl 的值, playerdata和pageurl检索到的页面。它们出现在<script>中标签:

<script>
..
..
  PlayerWatchdog.init({
      'playerurl': 'http://cdn.static.de/now/player.swf?ts=2011354353',
      'playerdata': 'http://www.example.com/player',
      'pageurl': 'http://www.example.com?test=2',
      });
..
..
</script>

实现这一目标最明智的方法是什么？

最佳答案

您可以使用 HTML 解析器，例如 Nokogiri ，拆开HTML文档，快速找到<script>标记你正在寻找的。 <script>里面的内容标签是文本，所以 Nokogiri 的 text方法将返回该值。然后就是有选择地检索所需的行，这可以通过简单的正则表达式来完成:

require 'nokogiri'

doc = Nokogiri::HTML(<<EOT)
<html>
  <head>
    <script>
      PlayerWatchdog.init({
          'playerurl': 'http://cdn.static.de/now/player.swf?ts=2011354353',
          'playerdata': 'http://www.example.com/player',
          'pageurl': 'http://www.example.com?test=2',
          });
    </script>
  </head>
</html>
EOT

script_text = doc.at('script').text 
playerurl, playerdata, pageurl = %w[
  playerurl
  playerdata
  pageurl
].map{ |i| script_text[/'#{ i }': '([^']+')/, 1] }

playerurl # => "http://cdn.static.de/now/player.swf?ts=2011354353'"
playerdata # => "http://www.example.com/player'"
pageurl # => "http://www.example.com?test=2'"

at 返回第一个匹配的<script> Node实例。根据 HTML，您可能不需要第一个匹配的 <script> 。您可以使用search相反，它将返回 NodeSet ，类似于节点数组，然后从 NodeSet 中获取特定元素，或者，您可以使用 XPath，而不是使用 CSS 选择器，这将让您轻松指定所需标记的特定出现位置。

找到标签后，text返回其内容，任务从 Nokogiri 转移到使用模式来查找所需内容。 /'#{ i }': '([^']+')/是一个简单的模式，用于查找在 i 中传入的单词。接下来是 : '然后捕获下一个 ' 之前的所有内容。该模式被传递到 String 的 [] 方法。

关于javascript - 从检索到的页面的 JavaScript 中解析数据，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/26717894/

javascript - 从检索到的页面的 JavaScript 中解析数据

上一篇：javascript - Angular 选择框在第一次更改时不起作用

下一篇：javascript - 未捕获的类型错误 : undefined is not a function javascript function