ruby /Mechanize "failed to allocate memory"。删除 'agent.get' 方法的实例化？

我在 Mechanize Ruby 脚本中遇到了一个关于内存泄漏的小问题。

我“while循环”多个网页永远访问，每次循环内存都会增加很多。这在几分钟后创建了“无法分配内存”并使脚本退出。

事实上，即使我将结果分配给同一个“局部变量”甚至“全局变量”，agent.get 方法也会实例化并保存结果。因此，我尝试在上次使用后和重用同名变量之前将 nil 分配给该变量。但似乎之前的 agent.get 结果仍然在内存中，并且真的不知道如何耗尽 RAM 以使我的脚本在几小时后使用大致稳定的内存量？

这里有两段代码:(按住“enter”键并看到 Ruby 分配的 RAM 不断增长)

#!/usr/bin/env ruby

require 'mechanize'

agent = Mechanize.new
agent.user_agent_alias = 'Windows Mozilla'
GC.enable
#puts GC.malloc_allocations
while gets.chomp!="stop"
    page = agent.get 'http://www.nypost.com/'
    puts "agent.object_id  : "+agent.object_id.to_s
    puts "page.object_id  : "+page.object_id.to_s
    page=nil
    puts "page.object_id  : "+page.object_id.to_s
    page = agent.get 'http://www.nypost.com/'
    puts "page.object_id  : "+page.object_id.to_s
    page=nil
    puts "page.object_id  : "+page.object_id.to_s
    puts local_variables
    GC.start
    puts local_variables
    #puts GC.malloc_allocations
end

并用全局变量代替:

#!/usr/bin/env ruby

require 'mechanize'

agent = Mechanize.new
agent.user_agent_alias = 'Windows Mozilla'
while gets.chomp!="stop"
    $page = agent.get 'http://www.nypost.com/'
    puts "agent.object_id  : "+agent.object_id.to_s
    puts "$page.object_id  : "+$page.object_id.to_s
    $page = agent.get 'http://www.nypost.com/'
    puts "$page.object_id  : "+$page.object_id.to_s
    #puts local_variables
    #puts global_variables
end

在其他语言中，变量会重新受到影响，并且分配的内存保持稳定。为什么 ruby 没有？如何强制实例变成垃圾？

编辑: 这是使用对象的另一个示例，因为 Ruby 是一种面向对象的语言，但结果完全相同:内存一次又一次增长...

#!/usr/bin/env ruby

require 'mechanize'

$agent = Mechanize.new
$agent.user_agent_alias = 'Windows Mozilla'
class GetContent
    def initialize url
        while true
            @page = $agent.get url
            remove_instance_variable(:@page)
        end
    end
end
myPage = GetContent.new('http://www.nypost.com/')

我的答案(没有足够的声誉来正确地做到这一点)

好吧!

看来Mechanize::History.clear极大地解决了内存泄漏的问题。

如果您想测试之前和之后的情况，这是最后修改的 Ruby 代码...

#!/usr/bin/env ruby

require 'mechanize'

$agent = Mechanize.new
$agent.user_agent_alias = 'Windows Mozilla'
class GetContent
    def initialize url
        while true
            @page = $agent.get url
            $agent.history.clear
        end
    end
end
myPage = GetContent.new('http://www.nypost.com/')

最佳答案

我的建议是设置agent.max_history = 0。如链接问题列表中所述。

这将阻止添加历史记录条目，而不是使用#clear。

这是其他答案的修改版本

#!/usr/bin/env ruby

require 'mechanize'

$agent = Mechanize.new
$agent.user_agent_alias = 'Windows Mozilla'
$agent.max_history = 0
class GetContent
    def initialize url
        while true
            @page = $agent.get url
        end
    end
end
myPage = GetContent.new('http://www.nypost.com/')

关于 ruby /Mechanize "failed to allocate memory"。删除 'agent.get' 方法的实例化？，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/7191752/

ruby /Mechanize "failed to allocate memory"。删除 'agent.get' 方法的实例化？

上一篇：ruby-on-rails - 升级到 OSX Lion 会影响我当前的 Rails 环境吗？

下一篇：ruby - 防火墙不工作