-
Notifications
You must be signed in to change notification settings - Fork 896
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Configure apache balancer with up to 10 members at startup #14007
Configure apache balancer with up to 10 members at startup #14007
Conversation
MiqUiWorker.install_apache_proxy_config | ||
MiqWebServiceWorker.install_apache_proxy_config | ||
MiqWebsocketWorker.install_apache_proxy_config | ||
MiqApache::Control.restart |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Restart apache after configuring the workers/balancers.
@@ -89,8 +93,6 @@ def sync_workers | |||
end | |||
end | |||
|
|||
modify_apache_ports(ports_hash, self::PROTOCOL) if MiqEnvironment::Command.supports_apache? | |||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We no longer modify the configuration after adding/removing Ui/Web Service/Web Socket workers...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is the selectable number of workers (for each type) limited to 10 in the UI ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@abellotti good question. The UI shows up to 9. Although, with advanced settings, you can choose more. This is why the maximum_workers_count
is set to 10, so even if you choose more, the system won't let you go beyond that max value.
@bdunne @carbonin @Fryguy @gtanzillo please review. |
7041b0e
to
bac1ce0
Compare
@jrafanie Can we get rid of even more of this code if we just ship a static file containing the balancer member list? |
@bdunne So, it looks like this:
evmcluster_ui is dynamic, it doesn't need to be. We could ship the file if we dropped the dynamic nature of those values. |
8fb8266
to
e9f583e
Compare
https://bugzilla.redhat.com/show_bug.cgi?id=1422988 Start Ui, Web Service, Web Socket, etc. puma workers bound to a port from STARTING_PORT to the maximum worker count port (3000 to 3009 if max worker count is 10). Configure apache at boot with these ports as balancer members. Fixes a failure after we start new puma workers and try to gracefully restart apache. The next request will fail since apache is waiting for active connections to close before restarting. The subsequent request will then be ok since the failure would cause the websocket connections to close, allowing apaache to restart fully. Previously, we would add and remove members in the balancer configuration when starting or stopping puma workers. We would then gracefully restart apache since the new workers wouldn't be used until apache reloaded the configuration. Note, we didn't do anything after removing members from the balancer configuration because apache's mod_proxy_balancer gracefully handles dead members by marking them as in Error and not retrying them for 60 seconds by default. Therefore, it's not necessary to restart apache to "remove" members. The problem is when we would try to add balancer members to the configuration and gracefully restart apache. It turns out, our web socket workers maintain active connections to apache so apache wouldn't restart until those connections were closed. Now, we take the idea mentioned above of the mod_proxy_balancer keeping track of which members are alive or in error by configuring up to 10, maximum_workers_count, members at server startup. We can then start and stop workers and let apache route traffic to the members that are alive. We no longer have to update the apache configuration and restart it when a worker starts or stops. Note, apache has a graceful reload option that could allow us to maintain an accurate list of balancer members as workers start and stop and tell apache workers to gracefully reload the configuration. This option was buggy until fixed in [1]. It also required us to keep touching the balancer configuration which we probably shouldn't have ben doing in the first place. [1] https://bz.apache.org/bugzilla/show_bug.cgi?id=44736
e9f583e
to
da9523e
Compare
Checked commit jrafanie@da9523e with ruby 2.2.6, rubocop 0.47.1, and haml-lint 0.20.0 |
@skateman Can you review this too? |
@jrafanie looks like websockets are working with any possible number of workers. Not sure what else should I test... 👍 |
@gtanzillo @carbonin I think this is ready to go, what do you think? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm for it! 👍
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm good with this 👍
https://bugzilla.redhat.com/show_bug.cgi?id=1422988 Fixes a regression in ManageIQ#14007 that affects the initial start of the appliance which caused a 503 error when trying to access the UI. Because adding balancer members does a validation of the configuration files and these files try to load the redirect files among others, we need to add the balancers members after all configuration files have been written by install_apache_proxy_config.
…tartup Configure apache balancer with up to 10 members at startup (cherry picked from commit 8518e63) https://bugzilla.redhat.com/show_bug.cgi?id=1432463
Euwe backport details:
|
Need to leave log/apache dir, delete log/*.log only (cherry picked from commit 7293dd4) Change was needed due to behavior change after ManageIQ/manageiq#14007
(which is the only caller of restart_apache) #14007
https://bugzilla.redhat.com/show_bug.cgi?id=1422988
Start Ui, Web Service, Web Socket, etc. puma workers bound to a port
from STARTING_PORT to the maximum worker count port (3000 to 3009 if max
worker count is 10). Configure apache at boot with these ports as
balancer members.
Fixes a failure after we start new puma workers and try to gracefully
restart apache. The next request will fail since apache is waiting for
active connections to close before restarting. The subsequent request will
then be ok since the failure would cause the websocket connections to
close, allowing apaache to restart fully.
Previously, we would add and remove members in the balancer configuration
when starting or stopping puma workers. We would then gracefully restart
apache since the new workers wouldn't be used until apache reloaded the
configuration. Note, we didn't do anything after removing members from
the balancer configuration because apache's mod_proxy_balancer gracefully
handles dead members by marking them as in Error and not retrying them for
60 seconds by default. Therefore, it's not necessary to restart apache to
"remove" members.
The problem is when we would try to add balancer members to the
configuration and gracefully restart apache. It turns out, our web
socket workers maintain active connections to apache so apache wouldn't
restart until those connections were closed.
Now, we take the idea mentioned above of the mod_proxy_balancer
keeping track of which members are alive or in error by configuring up
to 10, maximum_workers_count, members at server startup. We can then
start and stop workers and let apache route traffic to the members that
are alive. We no longer have to update the apache configuration and
restart it when a worker starts or stops.
Note, apache has a graceful reload option that could allow us to
maintain an accurate list of balancer members as workers start and stop
and tell apache workers to gracefully reload the configuration. This
option was buggy until fixed in [1]. It also required us to keep
touching the balancer configuration which we probably shouldn't have ben
doing in the first place.
[1] https://bz.apache.org/bugzilla/show_bug.cgi?id=44736