Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Configure apache balancer with up to 10 members at startup #14007

Merged

Conversation

jrafanie
Copy link
Member

https://bugzilla.redhat.com/show_bug.cgi?id=1422988

Start Ui, Web Service, Web Socket, etc. puma workers bound to a port
from STARTING_PORT to the maximum worker count port (3000 to 3009 if max
worker count is 10). Configure apache at boot with these ports as
balancer members.

Fixes a failure after we start new puma workers and try to gracefully
restart apache. The next request will fail since apache is waiting for
active connections to close before restarting. The subsequent request will
then be ok since the failure would cause the websocket connections to
close, allowing apaache to restart fully.

Previously, we would add and remove members in the balancer configuration
when starting or stopping puma workers. We would then gracefully restart
apache since the new workers wouldn't be used until apache reloaded the
configuration. Note, we didn't do anything after removing members from
the balancer configuration because apache's mod_proxy_balancer gracefully
handles dead members by marking them as in Error and not retrying them for
60 seconds by default. Therefore, it's not necessary to restart apache to
"remove" members.

The problem is when we would try to add balancer members to the
configuration and gracefully restart apache. It turns out, our web
socket workers maintain active connections to apache so apache wouldn't
restart until those connections were closed.

Now, we take the idea mentioned above of the mod_proxy_balancer
keeping track of which members are alive or in error by configuring up
to 10, maximum_workers_count, members at server startup. We can then
start and stop workers and let apache route traffic to the members that
are alive. We no longer have to update the apache configuration and
restart it when a worker starts or stops.

Note, apache has a graceful reload option that could allow us to
maintain an accurate list of balancer members as workers start and stop
and tell apache workers to gracefully reload the configuration. This
option was buggy until fixed in [1]. It also required us to keep
touching the balancer configuration which we probably shouldn't have ben
doing in the first place.

[1] https://bz.apache.org/bugzilla/show_bug.cgi?id=44736

MiqUiWorker.install_apache_proxy_config
MiqWebServiceWorker.install_apache_proxy_config
MiqWebsocketWorker.install_apache_proxy_config
MiqApache::Control.restart
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Restart apache after configuring the workers/balancers.

@@ -89,8 +93,6 @@ def sync_workers
end
end

modify_apache_ports(ports_hash, self::PROTOCOL) if MiqEnvironment::Command.supports_apache?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We no longer modify the configuration after adding/removing Ui/Web Service/Web Socket workers...

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is the selectable number of workers (for each type) limited to 10 in the UI ?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@abellotti good question. The UI shows up to 9. Although, with advanced settings, you can choose more. This is why the maximum_workers_count is set to 10, so even if you choose more, the system won't let you go beyond that max value.

@jrafanie
Copy link
Member Author

@bdunne @carbonin @Fryguy @gtanzillo please review.

@jrafanie jrafanie force-pushed the set_apache_balancer_once_at_startup branch from 7041b0e to bac1ce0 Compare February 21, 2017 21:10
@bdunne
Copy link
Member

bdunne commented Feb 21, 2017

@jrafanie Can we get rid of even more of this code if we just ship a static file containing the balancer member list?

@jrafanie
Copy link
Member Author

@jrafanie Can we get rid of even more of this code if we just ship a static file containing the balancer member list?

@bdunne So, it looks like this:

<Proxy balancer://evmcluster_ui/ lbmethod=byrequests>
BalancerMember http://0.0.0.0:3000
BalancerMember http://0.0.0.0:3001
BalancerMember http://0.0.0.0:3002
BalancerMember http://0.0.0.0:3003
BalancerMember http://0.0.0.0:3004
BalancerMember http://0.0.0.0:3005
BalancerMember http://0.0.0.0:3006
BalancerMember http://0.0.0.0:3007
BalancerMember http://0.0.0.0:3008
BalancerMember http://0.0.0.0:3009
</Proxy>

evmcluster_ui is dynamic, it doesn't need to be.
lbmethod=byrequests is dynamic (by configuration)
The port value is dynamic based on the type of worker, it doesn't need to be.

We could ship the file if we dropped the dynamic nature of those values.

@jrafanie jrafanie force-pushed the set_apache_balancer_once_at_startup branch 3 times, most recently from 8fb8266 to e9f583e Compare February 23, 2017 19:38
https://bugzilla.redhat.com/show_bug.cgi?id=1422988

Start Ui, Web Service, Web Socket, etc. puma workers bound to a port
from STARTING_PORT to the maximum worker count port (3000 to 3009 if max
worker count is 10).  Configure apache at boot with these ports as
balancer members.

Fixes a failure after we start new puma workers and try to gracefully
restart apache.  The next request will fail since apache is waiting for
active connections to close before restarting.  The subsequent request will
then be ok since the failure would cause the websocket connections to
close, allowing apaache to restart fully.

Previously, we would add and remove members in the balancer configuration
when starting or stopping puma workers.  We would then gracefully restart
apache since the new workers wouldn't be used until apache reloaded the
configuration.  Note, we didn't do anything after removing members from
the balancer configuration because apache's mod_proxy_balancer gracefully
handles dead members by marking them as in Error and not retrying them for
60 seconds by default. Therefore, it's not necessary to restart apache to
"remove" members.

The problem is when we would try to add balancer members to the
configuration and gracefully restart apache.  It turns out, our web
socket workers maintain active connections to apache so apache wouldn't
restart until those connections were closed.

Now, we take the idea mentioned above of the mod_proxy_balancer
keeping track of which members are alive or in error by configuring up
to 10, maximum_workers_count, members at server startup.  We can then
start and stop workers and let apache route traffic to the members that
are alive.  We no longer have to update the apache configuration and
restart it when a worker starts or stops.

Note, apache has a graceful reload option that could allow us to
maintain an accurate list of balancer members as workers start and stop
and tell apache workers to gracefully reload the configuration.  This
option was buggy until fixed in [1]. It also required us to keep
touching the balancer configuration which we probably shouldn't have ben
doing in the first place.

[1] https://bz.apache.org/bugzilla/show_bug.cgi?id=44736
@jrafanie jrafanie force-pushed the set_apache_balancer_once_at_startup branch from e9f583e to da9523e Compare February 23, 2017 19:46
@miq-bot
Copy link
Member

miq-bot commented Feb 23, 2017

Checked commit jrafanie@da9523e with ruby 2.2.6, rubocop 0.47.1, and haml-lint 0.20.0
3 files checked, 0 offenses detected
Everything looks good. 🍪

@jrafanie
Copy link
Member Author

jrafanie commented Mar 8, 2017

@skateman Can you review this too?

@skateman
Copy link
Member

skateman commented Mar 9, 2017

@jrafanie looks like websockets are working with any possible number of workers. Not sure what else should I test... 👍

@jrafanie
Copy link
Member Author

@gtanzillo @carbonin I think this is ready to go, what do you think?

Copy link
Member

@carbonin carbonin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm for it! 👍

Copy link
Member

@gtanzillo gtanzillo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm good with this 👍

@gtanzillo gtanzillo added this to the Sprint 56 Ending Mar 13, 2017 milestone Mar 10, 2017
@gtanzillo gtanzillo merged commit 8518e63 into ManageIQ:master Mar 10, 2017
@jrafanie jrafanie deleted the set_apache_balancer_once_at_startup branch March 12, 2017 20:42
jrafanie added a commit to jrafanie/manageiq that referenced this pull request Mar 13, 2017
https://bugzilla.redhat.com/show_bug.cgi?id=1422988

Fixes a regression in ManageIQ#14007 that affects the initial start of the
appliance which caused a 503 error when trying to access the UI.

Because adding balancer members does a validation of the configuration
files and these files try to load the redirect files among others,
we need to add the balancers members after all configuration files have
been written by install_apache_proxy_config.
simaishi pushed a commit that referenced this pull request Mar 15, 2017
…tartup

Configure apache balancer with up to 10 members at startup
(cherry picked from commit 8518e63)

https://bugzilla.redhat.com/show_bug.cgi?id=1432463
@simaishi
Copy link
Contributor

Euwe backport details:

$ git log -1
commit 44c8abf7d0e22f167b2b61976f34f6d8d39eec7a
Author: Gregg Tanzillo <[email protected]>
Date:   Fri Mar 10 17:11:25 2017 -0500

    Merge pull request #14007 from jrafanie/set_apache_balancer_once_at_startup
    
    Configure apache balancer with up to 10 members at startup
    (cherry picked from commit 8518e63699d4f7223d16a5461315270e0143abdf)
    
    https://bugzilla.redhat.com/show_bug.cgi?id=1432463

simaishi pushed a commit that referenced this pull request Mar 16, 2017
In Docker image, need to leave log/apache dir and delete log/*.log only
(cherry picked from commit 769e4df)

Change was needed due to behavior change after #14007
simaishi pushed a commit to ManageIQ/manageiq-pods that referenced this pull request Mar 16, 2017
Need to leave log/apache dir, delete log/*.log only
(cherry picked from commit 7293dd4)

Change was needed due to behavior change after
ManageIQ/manageiq#14007
carbonin pushed a commit that referenced this pull request May 17, 2017
(which is the only caller of restart_apache)
#14007
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

8 participants