The blueprint in this repo allows managing Cloudify Manager instances (Tier 1 managers) using a master Cloudify Manager (Tier 2 manager).
In order to use the blueprint the following prerequisites are necessary:
- A working 4.3 (RC or GA) Cloudify Manager (this will be the Tier 2
manager). You need to be connected to this manager (
cfy profiles use
). - An SSH key linked to a cloud keypair (this will be used to SSH into the Tier 1 VMs).
- A clone of this repo.
Optional:
- A
pip
virtualenv withwagon
installed in it. This is only necessary if you're planning on building the CMoM plugin yourself (more on the plugin below).
Two plugins are required to use the blueprint - an IaaS plugin (currently only OpenStack is supported) and the Cloudify Manager of Managers (or CMoM for short) plugin (which is a part of this repo).
First, upload the IaaS plugin to the manager. e.g. run:
cfy plugins upload <WAGON_URL> -y <YAML_URL>
Second, you'll need to create a Wagon from the CMoM plugin. Run (this
is assuming you're inside the manager-of-managers
folder):
wagon create -f plugins/cmom -o <CMOM_WAGON_OUTPUT>
Now upload this plugin as well:
cfy plugins upload <CMOM_WAGON_OUTPUT> -y plugins/cmom/plugin.yaml
a few files need to be present on the Tier 2 manager. All of those files
need to be accessible by Cloudify's user - cfyuser
. Because of this,
it is advised to place them in /etc/cloudify
, and make sure they are
chown
ed by cfyuser
(e.f. chown cfyuser: /etc/cloudify/filename
).
The files are:
-
The private SSH key connected to the cloud keypair. Its input is is
ssh_private_key_path
. -
The install RPM (its world-accessible URL will be provided separately). Its input is
install_rpm_path
. -
The CA certificate and key. Those will be used by the Tier 2 manager to connect to the Tier 1 managers, as well as for generating the Tier 1 managers' external certificates. The inputs are
ca_cert
andca_key
.
In summary, this inputs section should look like this:
inputs:
ca_cert: /etc/cloudify/ca_certificate.pem
ca_key: /etc/cloudify/ca_key.pem
ssh_private_key_path: /etc/cloudify/ssh_key
install_rpm_path: : /etc/cloudify/cloudify-manager-install.rpm
Now all that is left is to edit the inputs file (you can copy the
sample_inputs
file and edit it - see the
inputs section below for a full explanation), and run:
cfy install blueprint.yaml -b <BLUEPRINT_NAME> -d <DEPLOYMENT_ID> -i <INPUTS_FILE>
To get the outputs of the installation (currently the IPs of the master and slaves Cloudify Managers) run:
cfy deployments outputs <DEPLOYMENT_ID>
Currently, each deployment has 2 outputs:
cluster_ips
- The IPs of the current Tier 1 cluster leader, and slaves, in the following format:
- "cluster_ips":
Description: The master and the slaves of the Tier 1 cluster
Value: {u'Master': u'10.0.0.21', u'Slaves': [u'10.0.0.22']}
cluster_status
- Shows the overall health of the Tier 1 cluster, the full status of the cluster's leader (i.e.cfy status
), as well as any errors that may have occurred during the retrieval of the status (e.g. if there was no connection to the cluster).
Important: the status is generated by running the
get_status
workflow. It is not updated automatically, and there are not scheduled runs of the workflow. See theget_status
workflow for more information.
- "cluster_status":
Description: The current status of the cluster/leader
Value: {
u'leader_status': [
{u'service': u'InfluxDB', u'status': u'running'},
{u'service': u'Cloudify Composer', u'status': u'running'},
{u'service': u'AMQP-Postgres', u'status': u'running'},
{u'service': u'AMQP InfluxDB', u'status': u'running'},
{u'service': u'RabbitMQ', u'status': u'running'},
{u'service': u'Webserver', u'status': u'running'},
{u'service': u'Management Worker', u'status': u'running'},
{u'service': u'PostgreSQL', u'status': u'running'},
{u'service': u'Cloudify Console', u'status': u'running'},
{u'service': u'Manager Rest-Service', u'status': u'running'},
{u'service': u'Riemann', u'status': u'running'}
],
u'cluster_status': [
{
u'name': u'10.0.0.21',
u'database': u'OK',
u'consul': u'OK',
u'cloudify services': u'OK',
u'host_ip': u'10.0.0.21',
u'state': u'leader',
u'heartbeat': u'OK'
}, {
u'name': u'10.0.0.22',
u'database': u'OK',
u'consul': u'OK',
u'cloudify services': u'OK',
u'host_ip': u'10.0.0.22',
u'state': u'replica',
u'heartbeat': u'OK'
}
],
u'error': u''}
Below is a list with explanations for all the inputs necessary
for the blueprint to function. Much of this is mirrored in the
sample_inputs
file.
Currently only Openstack is supported as the platform for this blueprint. Implementations for other IaaSes will follow.
os_image
- OpenStack image name or ID to use for the new serveros_flavor
- OpenStack flavor name or ID to use for the new serveros_network
- OpenStack network name or ID the new server will be connected toos_subnet
- OpenStack name or ID of the subnet that's connected to the network that is to be used by the manageros_keypair
- OpenStack key pair name or ID of the key to associate with the new serveros_security_group
- The name or ID of the OpenStack security group the new server will connect toos_server_group_policy
- The policy to use for the server groupos_username
- Username to authenticate to OpenStack withos_password
- OpenStack passwordos_tenant
- Name of OpenStack tenant to operate onos_auth_url
- Authentication URL for KeyStone
There are currently 3 supported ways to assign the manager's IP.
To toggle between the different modes you'll need to leave only one of
the lines in infra
uncommented -
private_ip.yaml
,
floating_ip.yaml
or
private_fixed_ip.yaml
needs to be imported.
Important: only one of the files mentioned above needs to be imported, otherwise the blueprint will not work.
The 3 modes are:
- Using the FloatingIP mechanism. This requires providing a special input:
os_floating_network
- The name or ID of the OpenStack network to use for allocating floating IPs
-
Using only an internal network, without a floating IP. This requires creating a new port, which is assumed to be connected to an existing subnet.
-
Using a known in advance resource pool of IPs and hostnames. Like in the previous section, this requires creating a new port. This method also creates a "resource pool" object, that holds a list of resources and allocates them as the need arises. The inputs for this mode are:
resource_pool
- A list of resources from which the IP addresses and the hostnames should be chosen. The format should be as follows:
resource_pool:
- ip_address: <IP_ADDRESS_1>
hostname: <HOSTNAME_1>
- ip_address: <IP_ADDRESS_2>
hostname: <HOSTNAME_2>
The following inputs are only relevant in KeyStone v3 environments:
os_region
- OpenStack region to useos_project
- Name of OpenStack project (tenant) to operate onos_project_domain
- The name of the OpenStack project domain to useos_user_domain
- The name of the OpenStack user domain to use
When working with block storage devices (e.g. Cinder volumes) there is a special input that needs to be provided:
os_device_mapping
- this is a list of volumes as defined by the API here. An example input would look like this:
os_device_mapping:
- boot_index: "0"
uuid: "41a1f177-1fb0-4708-a5f1-64f4c88dfec5"
volume_size: 30
source_type: image
destination_type: volume
delete_on_termination: true
Where uuid
is the UUID of the OS image that should be used when
creating the volume.
Note: When using the
os_device_mapping
input, theos_image
input should be left empty.
Other potential inputs (for example, with subnet names, CIDRs etc.) might be added later.
These are general inputs necessary for the blueprint:
install_rpm_path
- as specified aboveca_cert
- as specified aboveca_key
- as specified abovemanager_admin_password
- as specified abovemanager_admin_username
- the admin username for the Tier 1 managers (default: admin)num_of_instances
- the number of Tier 1 instances to be created (default: 2). This affects the size of the HA cluster.ssh_user
- User name used when SSH-ing into the Tier 1 manager VMsssh_private_key_path
- as described above.additional_config
- An arbitrary dictionary which should mirror the structure of config.yaml It will be merged (while taking precedence) with the config as described in the cloudify.nodes.CloudifyTier1Manager type in the plugin.yaml file. Whenever possible the inputs in the blueprint.yaml file should be used. For example:
inputs:
additional_config:
sanity:
skip_sanity: true
restservice:
log:
level: DEBUG
mgmtworker:
log_level: DEBUG
Inside ldap_inputs.yaml
is a defined
datatype for LDAP inputs. It is useful to utilize it for convenience.
All the following inputs need to reside under ldap_config
in
the inputs file:
server
- The LDAP server address to authenticate againstusername
- LDAP admin username. This user needs to be able to make requests against the LDAP serverpassword
- LDAP admin passworddomain
- The LDAP domain to be used by the serverdn_extra
- Extra LDAP DN options. (separated by the;
sign. e.g. a=1;b=2). Useful, for example, when it is necessary to provide an organization ID.is_active_directory
- Specify whether the LDAP server used for authentication is an Active Directory server.
The actual input should look like this:
inputs:
ldap_config:
server: SERVER
username: USERNAME
password: PASSWORD
...
NOTE: When mentioning local paths in the context of the inputs below, the paths in question are paths on the Tier 2 manager. So if it is desirable to upload files from a local location, these files need to be present on the Tier 2 manager in advance. Otherwise, URLs may be used freely.
It is possible to create/upload certain types of resources on the Tier 1 cluster after installation. Those are:
tenants
- a list of tenants to create after the cluster is installed. The format is:
inputs:
tenants:
- <TENANT_NAME_1>
- <TENANT_NAME_2>
plugins
- a list of plugins to upload after the cluster is installed. The format is:
inputs:
plugins:
- wagon: <WAGON_1>
yaml: <YAML_1>
tenant: <TENANT_1>
- wagon: <WAGON_2>
yaml: <YAML_2>
visibility: <VIS_2>
Where:
WAGON
is either a URL of a Cloudify Plugin (e.g. openstack.wgn), or a local (i.e. on the Tier 2 manager) path to such wagon (required)YAML
is the plugin's plugin.yaml file - again, either URL or local path (required)TENANT
is the tenant to which the plugin will be uploaded (the tenant needs to already exist on the manager - use the abovetenants
input to create any tenants in advance). (Optional - default is default_tenant)VISIBILITY
defines who can see the plugin - must be one of [private, tenant, global] (Optional - default is tenant). Both WAGON and YAML are required fields
secrets
- a list of secrets to create after the cluster is installed. The format is:
inputs:
secrets:
- key: <KEY_1>
string: <STRING_1>
file: <FILE_1>
visibility: <VISIBILITY_1>
Where:
KEY
is the name of the secret which will then be used by other blueprints by the intrinsicget_secret
function (required)STRING
is the string value of the secret [mutually exclusive with FILE]FILE
is a local path to a file which contents should be used as the secrets value [mutually exclusive with VALUE]VISIBILITY
defines who can see the secret - must be one of [private, tenant, global] (Optional - default is tenant).KEY
is a required field, as well as one (and only one) of STRING or FILE.
blueprints
- a list of blueprints to upload after the cluster is installed. The format is:
inputs:
blueprints:
- path: <PATH_1>
id: <ID_1>
filename: <FILENAME_1>
tenant: <TENANT_1>
visibility: <VISIBILITY_1>
Where:
PATH
can be either a local blueprint yaml file, a blueprint archive or a url to a blueprint archive (required)ID
is the unique identifier for the blueprint (if not specified, the name of the blueprint folder/archive will be usedFILENAME
is the name of an archive's main blueprint file. Only relevant when uploading an archiveTENANT
is the tenant to which the blueprint will be uploaded (the tenant needs to already exist on the manager - use the abovetenants
input to create any tenants in advance). (Optional - default is default_tenant)VISIBILITY
defines who can see the secret - must be one of [private, tenant, global] (Optional - default is tenant).
deployments
- A list of additional deployments to create on the manager after install. The format is:
inputs:
deployments:
- deployment_id: <DEP_ID_1>
blueprint_id: <BLU_ID_1>
inputs: <INPUTS_1>
tenant: <TENANT_1>
visibility: <VISIBILITY_1>
Where:
DEPLOYMENT_ID
is the unique identifier for the blueprint (if not specified, the id of the blueprint will be used)BLUEPRINT_ID
is the unique identifier for the blueprintINPUTS
is either a dictionary of inputs for the deployment, or a PATH to a local (i.e. accessible on the Tier 2 manager) YAML fileTENANT
is the tenant to which the blueprint will be uploaded (the tenant needs to already exist on the manager - use the abovetenants
input to create any tenants in advance). (Optional - default is default_tenant)VISIBILITY
defines who can see the secret - must be one of [private, tenant, global] (Optional - default is tenant).
scripts
- a list of scripts to run after the manager's installation. All these scripts need to be available on the Tier 2 manager and accessible bycfyuser
. These scripts will be executed after the manager is installed but before the cluster is created. The format is:
scripts:
- <PATH_TO_SCRIPT_1>
- <PATH_TO_SCRIPT_2>
files
- a list of files to copy to the Tier 1 managers from the Tier 2 amnager after the Tier 1 managers' installation. All these files need to be available on the Tier 2 manager and accessible bycfyuser
. These files will be copied after the manager is installed but before the cluster is created. The format is:
files:
- src: <TIER_2_PATH_1>
dst: <TIER_1_PATH_1>
- src: <TIER_2_PATH_2>
dst: <TIER_1_PATH_2>
The following inputs are only relevant when upgrading a previous deployment. Use them only when installing a new deployment to which you wish to transfer data/agents from an old deployment.
restore
- Should the newly installed Cloudify Manager be restored from a previous installation. Must be used in conjunction with some of the other inputs below. Seeplugin.yaml
for more details (default: false)snapshot_path
- A local (relative to the Tier 2 manager) path to a snapshot that should be used. Mutually exclusive withold_deployment_id
andsnapshot_id
(default: '')old_deployment_id
- The ID of the previous deployment which was used to control the Tier 1 managers. If thebackup
workflow was used with default values there will be a special folder with all the snapshots from the Tier 1 managers. If thebackup
input is set tofalse
snapshot_id
must be provided as well (default: '')snapshot_id
- The ID of the snapshot to use. This is only relevant ifold_deployment_id
is provided as well (default: '')transfer_agents
- If set totrue
, aninstall_new_agents
command will be executed after the restore is complete (default: true)restore_params
- An optional list of parameters to pass to the underlyingcfy snapshots restore
command. Accepted values are: [--without-deployment-envs
,--force
,--restore-certificates
,--no-reboot
]. These need to be passed as-is with both dashes. (default: [])
The backup
workflow can be used at any time. It creates a
snapshot on the Tier 1 cluster, and downloads it to the Tier 2 manager.
The snapshots are all saved in /etc/cloudify/snapshots/DEPLOYMENT_ID
.
Run the workflow like this:
cfy executions start backup -p inputs.yaml
Where the optional inputs file may contain the following:
snapshot_id
- The ID of the snapshot to create (will default to the current time/date)backup_params
- An optional list of parameters to pass to the underlyingcfy snapshots create
command. Accepted values are: [--include-metrics
,--exclude-credentials
,--exclude-logs
,--exclude-events
]. These need to be passed as-is with both dashes. (default: [])
This workflow gets the cluster/leader status of the Tier 1 cluster. It does
not accept any params, and it can be run at any time. The workflow populates
the status
runtime property of the cloudify_cluster
node, and is then
reflected in the cluster_status
deployment output. See more in the Outputs
section.
This workflow allows to upload blueprints to the Tier 1 cluster.
The workflow accepts a single param blueprints
, which is a list of
blueprints in the format described in Additional inputs.
This workflow allows to upload blueprints to the Tier 1 cluster.
The workflow accepts a single param blueprints
, which is a list of
blueprints in the format described in Additional inputs.
This workflow allows to upload blueprints to the Tier 1 cluster.
The workflow accepts a single param blueprints
, which is a list of
blueprints in the format described in Additional inputs.
This workflow allows to upload plugins to the Tier 1 cluster.
The workflow accepts a single param plugins
, which is a list of
plugins in the format described in Additional inputs.
This workflow allows to create tenants on the Tier 1 cluster.
The workflow accepts a single param tenants
, which is a list of
tenants in the format described in Additional inputs.
This workflow allows to create secrets on the Tier 1 cluster.
The workflow accepts a single param secrets
, which is a list of
secrets in the format described in Additional inputs.
This workflow allows to create deployments on the Tier 1 cluster.
The workflow accepts a single param deployments
, which is a list of
deployments in the format described in Additional inputs.
This workflow allows to perform in one workflow the following operations:
- Create tenants
- Create secrets
- Upload plugins
- Upload blueprints
- Create deployments
It accepts the params that the previous 5 workflows do:
parameters:
tenants: []
secrets: []
plugins: []
blueprints: []
deployments: []
All of those are lists in the format described in Additional inputs.
This workflow allows executing a workflow on the Tier 1 cluster. This is
similar to running cfy executions start
on a manager. The format of the
inputs is:
parameters:
workflow_id:
description: The ID of the workflow to execute
type: string
deployment_id:
description: The ID of the deployment on which to execute the workflow
type: string
parameters:
description: Parameters for the workflow (can be provided like inputs)
default: {}
allow_custom_parameters:
description: >
Allow passing custom parameters (which were not
defined in the workflow's schema in the blueprint) to the execution
type: boolean
default: false
force:
description: >
Execute the workflow even if there is an ongoing
execution for the given deployment
type: boolean
default: false
timeout:
description: Operation timeout in seconds
type: integer
default: 900
include_logs:
description: Include logs in returned events
type: boolean
default: true
queue:
description: >
If set, executions that can`t currently run will be queued and run
automatically when possible.
type: boolean
default: false
tenant_name:
description: The name of the tenant in which the deployment exists
type: string
default: ''
The blueprint implements an auto-healing mechanism for the Tier 1 managers. Due to the fact that Tier 1 managers are configured in a cluster, and software issues are to be handled by HA failovers, healing is only performed on nodes that loose connection to the Tier 2 manager (e.g. shut-off or terminated VMs, network issues, etc).
A simple monitoring policy is defined that sends metrics (using the Diamond plugin) to the Tier 2 manager.
Metric heartbeats are sent every 10 seconds by default. See here to learn how to alter the
interval
if necessary.
The above metrics are parsed by a simple host-failure policy, which is triggered if the metrics stop being delivered. This policy triggers a custom healing workflow, which build on the default heal workflow and adds functionality to it. The additional functionality has to do with the fact that we need to first remove the faulty node from the Tier 1 HA cluster, and then, after it was reinstalled, rejoin the same cluster.
The flow of the healing workflow is as follows:
- Try to find at least one Tier 1 manager that is still alive. If none is found, the workflow will fail and retry. This is intended - the scenario we're trying to avoid is a case where connection to the whole cluster is down. We don't want to perform heal in this case, because the cluster is stateful, and healing is not intended for disaster recovery.
In case the whole cluster is down, the workflow will retry the default amount of times (60) with the default interval (15 seconds), unless configured otherwise in the blueprint (see here to learn how to alter
task_retries
andretry_interval
). The message displayed in this scenario will be:Could not find a profile with a cluster configured. This might mean that the whole network is unreachable.
-
Once found, remove the faulty node's IP from the cluster. This step is necessary in order for it to rejoin the cluster later on.
-
Do a backup - this is just a precaution. If something will go wrong with the healing, you can create a new cluster that will be restored from a snapshot created during this step.
The snapshots are saved in a folder created specifically for the deployment being backed-up under
/etc/cloudify
. Unless specified, the default is to use the current date and time as the snapshot name. So, a snapshot path will look like this:/etc/cloudify/snapshots/<DEPLOYMENT_NAME>/2018-03-21-09:09:05.zip
-
Uninstall the faulty node. This means removing the whole VM.
-
Reinstall the faulty node. This means re-creating the VM, and reinstalling Cloudify Manager on it.
Note that the Manager will be recreated with the same IP and other configurations.
- Re-join the Tier 1 HA cluster.
Because we're expecting HA failovers in cases of faulty nodes, the interval between subsequent heal workflow runs was increased to 600 seconds (from the default 300), in order to accommodate the selection of a new cluster leader. The value can be configured in the main blueprint YAML file.
After a successful heal any users working with the Tier 1 cluster via
the CLI should run cfy cluster update-profile
in order to update
their local profiles.
Users that are working with the Tier 1 cluster via the GUI will have to use a different IP when connecting to the manager if the healed node was the cluster leader. If the healed node was a replica, no further actions are required.
Important: this is a beta feature, and it shouldn't be used in production. It is not guaranteed to remain a part of this product.
The meta
blueprint and plugin can be used to aggregate several MoM
deployments to more easily manage them.
First, upload the meta
plugin to the Tier 2 manager.
Next upload the meta_blueprint.yaml
blueprint to the manager, and create
a deployment from it. For example:
cfy blueprints upload meta_blueprint.yaml -b meta
cfy deployments create meta -b meta
Then, run the add_deployment
workflow to add any live MoM deployments.
For example, if you have a deployment with the ID cfy_manager_dep_1
, run:
cfy executions start add_deployment -d meta -p delpoyment_id=cfy_manager_dep_1
You can run this workflow for every MoM deployment you have.
And finally, you can run a get_status
workflow on the meta deployment, to
populate its outputs with the statuses of all the statuses of the attached
deployments.
cfy executions start get_status -d meta
Then check out the outputs of the meta
deployment to get the statuses.
Prepare your Tier 2 Manager with the files in ./options in your /etc/cloudify directory.
Add Update your inputs with something like this:
files:
- src: /etc/cloudify/ssh_key
dst: /home/centos/ssh_key
- src: /etc/cloudify/0001-skip-host-checking-on-patchify.patch
dst: /home/centos/0001-skip-host-checking-on-patchify.patch
scripts: [/etc/cloudify/patch_manager.sh]