capistrano - 上帝监控的delayed_job - 重启后重复进程

标签 capistrano delayed-job god

我正在监控delayed_job使用God 。这是我的上帝配置文件。

  QUEUE = "slow"
  WORKERS = 14

  WORKERS.times do |num|
    God.watch do |w|
      w.name = "dj.#{num}"
      w.group = "tanda"

      w.uid = 'deployer'
      w.gid = 'deployer'

      w.start = "cd #{RAILS_ROOT}; RAILS_ENV=#{RAILS_ENV} bundle exec script/delayed_job start --queue=#{QUEUE} --pid-dir=#{RAILS_ROOT}/tmp/pids -i #{num}"
      w.restart = "cd #{RAILS_ROOT}; RAILS_ENV=#{RAILS_ENV} bundle exec script/delayed_job restart --queue=#{QUEUE} --pid-dir=#{RAILS_ROOT}/tmp/pids -i #{num}"
      w.stop = "cd #{RAILS_ROOT}; RAILS_ENV=#{RAILS_ENV} bundle exec script/delayed_job stop -i #{num}"

      w.start_grace = 30.seconds
      w.restart_grace = 30.seconds
      w.stop_grace = 30.seconds

      w.pid_file = "#{RAILS_ROOT}/tmp/pids/delayed_job.#{num}.pid"
      w.log = "#{RAILS_ROOT}/log/dj.#{num}.log"
      w.err_log = "#{RAILS_ROOT}/log/dj.#{num}.errors.log"

      w.behavior(:clean_pid_file)
      w.interval = 30.seconds
      w.dir = File.expand_path('.')

      w.env = {
        "RACK_ENV" => RAILS_ENV,
        "RAILS_ENV" => RAILS_ENV,
        "CURRENT_DIRECTORY" => RAILS_ROOT
      }

      w.start_if do |start|
        start.condition(:process_running) do |c|
          c.interval = 5.seconds
          c.running = false
        end
      end

      w.lifecycle do |on|
        on.condition(:flapping) do |c|
          c.to_state = [:start, :restart]
          c.times = 10
          c.within = 3.minutes
          c.transition = :unmonitored
          c.retry_in = 10.minutes
        end
      end
    end
  end

然后我使用 Capistrano 2 重新启动这些进程每次部署时:

run("cd #{current_path} && rvmsudo god restart tanda")

当我启动 God 时,我的 ps 输出如下所示。

s -e -www -o pid,rss,command | grep delayed
31960 220804 delayed_job.0
31966 220152 delayed_job.8
31973 226012 delayed_job.9
31979 215176 delayed_job.1
31984 210260 delayed_job.13
31994 240424 delayed_job.3
31997 225248 delayed_job.11
32003 196364 delayed_job.5
32009 236192 delayed_job.6
32015 214540 delayed_job.12
32022 247096 delayed_job.4
32029 206352 delayed_job.2
32047 232748 delayed_job.7
32061 228128 delayed_job.10

如果我立即重新启动 Capistrano,而不进行部署或其他任何操作,那么一分钟后它看起来像这样。

ps -e -www -o pid,rss,command | grep delayed
 9884 198076 delayed_job.10
 9895 195372 delayed_job.0
 9919 196856 delayed_job.6
 9948 196772 delayed_job.5
 9964 196568 delayed_job.9
 9973 194092 delayed_job.12
 9982 195648 delayed_job.13
 9997 196392 delayed_job.2
10005 195356 delayed_job.4
10016 197268 delayed_job.3
10032 198820 delayed_job.8
10054 194316 delayed_job.7
10078 196780 delayed_job.11
10127 202420 delayed_job.1
10133 197468 delayed_job.1
10145 194040 delayed_job.1
10158 195760 delayed_job.1
10173 195844 delayed_job.1

再次重新启动后:

ps -e -www -o pid,rss,command | grep delayed
 9884 221780 delayed_job.10
 9973 225100 delayed_job.12
 9982 224708 delayed_job.13
10078 235076 delayed_job.11
21467 187056 delayed_job.0
21483 187844 delayed_job.7
21497 189648 delayed_job.10
21509 187316 delayed_job.2
21518 188180 delayed_job.11
21527 187968 delayed_job.3
21542 187852 delayed_job.12
21546 186900 delayed_job.13
21556 188628 delayed_job.5
21565 187816 delayed_job.9
21574 185216 delayed_job.4
21585 188088 delayed_job.1
21599 188556 delayed_job.1
21602 188400 delayed_job.1
21615 193484 delayed_job.1
21628 193288 delayed_job.8
21632 188228 delayed_job.1
21643 187804 delayed_job.6

正如您所看到的,这些重复进程有时有新的 pid(例如,从第一个转储到第二个转储的所有进程),但有时则没有(例如,从第二个转储到第三个的 DJ 10)。

我真的不知道从哪里开始调试这个。重新启动时,上帝不会给出任何错误,DJ 日志仅显示启动进程时的常规输出。而同样的事情不会发生在一个较小的服务器上,该服务器只需要运行 4 个工作线程(但在其他方面是相同的)。

有人见过这个吗?

最佳答案

我认为这一定是 delayed_job 作业在后台工作时使用的 daemons gem 中的一个问题,因为将其添加到我的 God 文件的顶部似乎有固定的事情:

ids = ('a'..'z').to_a
workers.times do |num|
  num = ids[num]

似乎存在一个问题,名为 delayed_job.1delayed_job.11 (等)的进程会发生冲突,从而导致很多问题。我还没有真正将其隔离得太远,但将其更改为不同的命名约定(在本例中为delayed_job.a)现在已经解决了我的问题。

如果有人有更好的解决方案/为什么它有效的原因,将保持此开放状态。

关于capistrano - 上帝监控的delayed_job - 重启后重复进程,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/33726792/

相关文章:

ruby-on-rails - GitHub 通过 Capistrano 部署。公共(public) Repo 和 SSH 可以工作,但 Private 不行

ruby-on-rails - 为什么我会收到此 Passenger 错误 Could not find rake-0.9.2.2 in any of the sources?

ruby-on-rails - 部署后缺少 gem (Ruby、Ruby on Rails、Capistrano)

ruby-on-rails - Capistrano 在虚拟机中部署

ruby-on-rails - 日志在生产中无法正常工作,作业延迟

ruby-on-rails - 神配置文件监控现有进程?

ruby-on-rails - 确保加载 Rails 应用程序时某些进程正在运行

ruby - 如何使用 Cucumber 测试 DelayedJob?

node.js - 延迟作业 node.js web 应用程序

ruby-on-rails - 上帝无法启动redis服务器。收到此错误 : `/var/run/redis/redis-server.pid' : Permission denied