diff --git a/doc/operation-and-maintenance/Cluster-restart.md b/doc/operation-and-maintenance/Cluster-restart.md new file mode 100644 index 0000000000..f8eae73fee --- /dev/null +++ b/doc/operation-and-maintenance/Cluster-restart.md @@ -0,0 +1,25 @@ +When you are using a MongooseIM cluster that is using Mnesia backend for any extensions, there could occur an issue related to the distributed Mnesia nodes. + +## How to restart a cluster: + +Having Node A and Node B, the cluster restart procedure should occur in the following way: + +![How to restart a cluster](cluster_restart.png) + +Start the nodes in the opposite order to the one in which they were stopped. +The first node you restart should be the last one to go down. +For cluster with 3 nodes, after stopping the nodes `ABC`, they should be started in `CBA` order. + +## How NOT to restart a cluster: + +Having Node A and Node B. + +![How not to restart a cluster](incorrect_cluster_restart.png) + +When the nodes are stopped in `AB` order, starting the node `A` first can result in issues related to the distributed Mnesia nodes and not bring up a node that is fully operational. + +Changing the order of the restarted nodes can cause issues with distributed Mnesia. +Make sure to follow the recommendations if you are using Mnesia backend for any of the extensions. +Please note that for some of the extensions, the Mnesia backend is set by default without having that configured explicitly in the configuration file. + +For more information related to the cluster configuration and maintenance, please see [Cluster configuration and node management](Cluster-configuration-and-node-management.md) section. diff --git a/doc/operation-and-maintenance/Reloading-configuration-on-a-running-system.md b/doc/operation-and-maintenance/Reloading-configuration-on-a-running-system.md deleted file mode 100644 index ba889ffead..0000000000 --- a/doc/operation-and-maintenance/Reloading-configuration-on-a-running-system.md +++ /dev/null @@ -1,38 +0,0 @@ -`mongooseimctl` subcommands for configuration reloading are: - -`mongooseimctl reload_local` - -`mongooseimctl reload_cluster` - -`mongooseimctl reload_cluster_dryrun` - -`reload_local` is unsafe as it reloads the configuration only on the local node. -This might introduce inconsistencies between different nodes in the cluster. -It's available as a safety mechanism for the rare case of a cluster-global reload failing. - -`reload_cluster` applies the configuration on all nodes in the cluster. -The prerequisite is that the same version of a config file must be available on -all nodes. All nodes in a cluster must have the same config loaded into memory -as well. There is a small exception from this rule, see "Node-specific options" -below on this page. - -`reload_cluster_dryrun` calculates and prints config changes, -but does not apply them. -Useful for debugging. - -## Non-reloadable options -Some options require restarting the server in order to be reloaded. -The following options' changes will be ignored when using `mongooseimctl` tool: - -* s2s.\* -* general.all_metrics_are_global -* \*.rdbms.\* - -## Node-specific options - -This option is deprecated and not available when using a config file in the TOML -format. - -For the documentation of this option for the `cfg` config format please refer to the -[MIM 3.7.1 documentation](https://esl.github.io/MongooseDocs/3.7.1/operation-and-maintenance/Reloading-configuration-on-a-running-system/) -or older. diff --git a/doc/operation-and-maintenance/Rolling-upgrade.md b/doc/operation-and-maintenance/Rolling-upgrade.md new file mode 100644 index 0000000000..91ad9d11e3 --- /dev/null +++ b/doc/operation-and-maintenance/Rolling-upgrade.md @@ -0,0 +1,89 @@ +## Rolling upgrade +For all MongooseIM production deployments we recommend running multiple server nodes connected in a cluster behind a load-balancer. +Rolling upgrade is a process of upgrading MongooseIM cluster, one node at a time. +Make sure you have at least the number of nodes able to handle your traffic plus one before the rolling upgrade to guarantee the availability and minimise the downtime. +Running different MongooseIM versions at the same time beyond the duration of the upgrade is not recommended and not supported. + +Rolling upgrade procedure is recommended over configuration reload which is not supported since version 4.1. + +Please note that more complex upgrades that involve schema updates, customisations or have functional changes might require more specific and specially crafted migration procedure. + +If you want just to make the changes to the configuration file, please follow steps 1, 3, 4, 6, 7, 8. +This type of change can also be done one node at a time. +It would require you to check the cluster status, modify the configuration file and restart the node. + +The usual MongooseIM cluster upgrade can be achieved with the following steps: + +### 1. Check the cluster status. + +Use the following command on the running nodes and examine the status of the cluster: + +```bash +mongooseimctl mnesia info | grep "running db nodes" + +running db nodes = [mongooseim@node1, mongooseim@node2] +``` + +This command shows all running nodes. +A healthy cluster should list all nodes that are part of the cluster. + +Should you have any issues related to node clustering, please refer to [Cluster configuration and node management](Cluster-configuration-and-node-management.md) section. + +### 2. Copy the configuration file. + +Make a copy of the configuration file before the upgrade, as some package managers might override your custom configuration with the default one. +Please note that since version 4.1 `*.cfg` MongooseIM configuration format is no longer supported and needs to be rewritten in the new `*.toml` format. + +### 3. Apply the changes from the migration guide. + +All modifications of the configuration file or updates of the database schema, that are required to perform version upgrade, can be found in the Migration Guide section. +When upgrading more than one version, please make sure to go over all consecutive migration guides. + +For example, when migrating from MongooseIM 3.7 to 4.1, please familiarize yourself with and apply all necessary changes described in the following pages of the Migration Guide section. + +* 3.7.0 to 4.0.0 +* 4.0.0 to 4.0.1 +* 4.0.1 to 4.1.0 + +### 4. Stop the running node. + +Use the following command to stop the MognooseIM node: + +```bash +mongooseimctl stop +``` + +### 5. Install new MongooseIM version. + +You can get the new version of MongooseIM by either [building MongooseIM from source code](../user-guide/How-to-build.md) or [downloading and upgrading from package](../../user-guide/Getting-started/#download-a-package). + +### 6. Start the node. + +Use the following command to start and check the status of the MognooseIM node and the cluster: + +```bash +mongooseimctl start +mongooseimctl status + +mongooseimctl mnesia info | grep "running db nodes" +``` + +### 7. Test the cluster. + +Please verify that the nodes are running and part of the same cluster. +If the cluster is working as expected, the migration of the node is complete. + +### 8. Upgrade the remaining nodes. + +Once all the prior steps are completed successfully, repeat the process for all nodes that are part of the MongooseIM cluster. + +## Further cluster upgrade considerations + +Another way to perform a cluster upgrade with minimising possible downtime would be to setup a parallel MongooseIM cluster running newer version. +You can redirect the incoming traffic to the new cluster with use of a load-balancer. + +Once no connections are handled by the old cluster, it can by safely stopped and the migration is complete. + +We highly recommend testing new software release in staging environment before it is deployed on production. + +Should you need any help with the upgrade, deployments or load testing of your MongooseIM cluster, please reach out to us. MongooseIM consultancy and support is part of [our commercial offering](https://www.erlang-solutions.com/products/mongooseim.html). diff --git a/doc/operation-and-maintenance/cluster_restart.png b/doc/operation-and-maintenance/cluster_restart.png new file mode 100644 index 0000000000..a58e31150c Binary files /dev/null and b/doc/operation-and-maintenance/cluster_restart.png differ diff --git a/doc/operation-and-maintenance/incorrect_cluster_restart.png b/doc/operation-and-maintenance/incorrect_cluster_restart.png new file mode 100644 index 0000000000..e6fa8d18c1 Binary files /dev/null and b/doc/operation-and-maintenance/incorrect_cluster_restart.png differ diff --git a/mkdocs.yml b/mkdocs.yml index 7e8a5b4fb5..d8b5255540 100644 --- a/mkdocs.yml +++ b/mkdocs.yml @@ -84,7 +84,8 @@ nav: - 'Logging configuration': 'operation-and-maintenance/Logging.md' - 'Logging with Humio': 'operation-and-maintenance/Humio.md' - 'Logging fields': 'operation-and-maintenance/Logging-fields.md' - - 'Reloading configuration on a running system': 'operation-and-maintenance/Reloading-configuration-on-a-running-system.md' + - 'Rolling upgrade': 'operation-and-maintenance/Rolling-upgrade.md' + - 'Cluster restart': 'operation-and-maintenance/Cluster-restart.md' - 'Metrics': 'operation-and-maintenance/MongooseIM-metrics.md' - 'System Metrics Privacy Policy': 'operation-and-maintenance/System-Metrics-Privacy-Policy.md' - 'Distribution over TLS': 'operation-and-maintenance/tls-distribution.md'