Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Create Zombie process on ruby 2.1.0 after start #23

Open
boris opened this issue Oct 9, 2014 · 1 comment
Open

Create Zombie process on ruby 2.1.0 after start #23

boris opened this issue Oct 9, 2014 · 1 comment

Comments

@boris
Copy link

boris commented Oct 9, 2014

I'm using rbenv ruby 2.1.0, bluepill 0.0.68 with this pill file:

Bluepill.application("resque") do |a|
  a.working_dir = "/path/to/current"
  a.process("resque") do |p|
    p.start_command = "bundle exec rake workers:start RAILS_ENV=production WORKERS_ENV=true"
    p.pid_file = "/path/to/shared/pids/workers.pid"
    p.uid = "deploy"
    p.gid = "deploy"
  end
end

Everytime I run a bluepill restart resque, the process restart properly but it creates a zombie child process: http://cl.ly/image/0V3o0U1V3015

@jschneiderhan
Copy link

I'm seeing this too on 0.0.69, although with a different configuration (unicorn - deamonize=true), and I think I know what's going on, at least in my instance.

The Bluepill::System#execute_blocking method forks https://github.com/bluepill-rb/bluepill/blob/v0.0.69/lib/bluepill/system.rb#L154 and Exec's the process command (in my case unicorn).

The parent calls waitpid() on the child https://github.com/bluepill-rb/bluepill/blob/v0.0.69/lib/bluepill/system.rb#L194. If this waitpid() call is not made, the child stays in the zombie state and is never released.

The Bluepill::Process class calls #execute_blocking in a few places, such as https://github.com/bluepill-rb/bluepill/blob/v0.0.69/lib/bluepill/process.rb#L296, but when it does so, it does within a with_timeout block.

If the timeout value is reached before the appicatioin process (unicorn) completes, waitpid https://github.com/bluepill-rb/bluepill/blob/v0.0.69/lib/bluepill/system.rb#L194 is never called, so the child is a zombie.

I ended up solving this by increasing my start_grace_time value.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants