From 75c3b33c668324b1001c2c5b1bda98b83a5e1558 Mon Sep 17 00:00:00 2001 From: XaverStiensmeier <36056823+XaverStiensmeier@users.noreply.github.com> Date: Thu, 25 May 2023 11:07:49 +0200 Subject: [PATCH 1/2] added exact versions for openstsacksdk and python-openstackclient (#413) --- requirements.txt | 10 +++++++--- 1 file changed, 7 insertions(+), 3 deletions(-) diff --git a/requirements.txt b/requirements.txt index 8aa11c8c..a5b5e1bf 100644 --- a/requirements.txt +++ b/requirements.txt @@ -1,6 +1,10 @@ +openstacksdk==0.62 mergedeep -openstacksdk paramiko -python-openstackclient -sshtunnel +python-cinderclient +python-keystoneclient +python-novaclient +python-openstackclient==6.0.0 +PyYAML shortuuid +sshtunnel \ No newline at end of file From 769eb10fdd7102e03485c55bb8faf963ab83cda3 Mon Sep 17 00:00:00 2001 From: XaverStiensmeier <36056823+XaverStiensmeier@users.noreply.github.com> Date: Tue, 30 May 2023 16:10:48 +0200 Subject: [PATCH 2/2] Keep master updated (#401) MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit * add apt.bi.denbi.de as package source * update slurm tasks (now uses self-build slurm packages -> v22.05.7, restructure slurm files) * add documentation to build newer Slurm package * fixes * slurmrestd uses openapi/v0.0.38 * Added check_nfs as a non fatale evaluation (#366) * Added "." and "-" cases for cid. This allows further rescuing and gives info messages. (#365) * Added identifier for when no profile is defined to have a distinct identifier. * Activated vpn setup * Fixed example command * Added logging info for file push and commands * fix slurmrestd connfiguration * Implementing wireguard * update task order (slurm-server) * fix default user chown settings * Add an additional mariadb repository for Ubuntu 20.04. Zabbix 7.2 needs at least MariaDB 10.5 or higher and Focal comes with MariaDB 10.3. * Extend slurm documentation. * Extends documentation that BiBiGrid now supports Ubuntu 20.04/22.04 and Debian 11 (fixes #348). * cleanup * fix typos in documentation * Updated wg0 * fix typos in documentation * add workflow-job to lint python/ansible * add more output * add more output * update runner working directory * make ansible_lint happy * rewrite linting workflow add linting dependencies * fix a typo * fix pylintrc -> remove ignore-pattern=test/ (not needed, since pylint currently lints bibigrid folder) make pylint happy * fixing jinja * changed jinja * Fixed wrong when clause * Removed unnecessary comments and added index implementation * this_peer is now used * Added configuration reload if necessary * Moved restart to handlers * Added missing handler * Changed to systemd setup * Fixed nfs * Fixed a few bugs more to come * added some defaults * Added vpn wkr without ip * removed unnecessary print and fixed typo * added vpn counter * debugging bug * debugging vpnwkr naming is wrong * Commenting out worker creation * Fixed bug making first worker and numberless * fixed number order in deletion * vpn workers added to instances.yml * Added key generator for wireguard keys Fixed minor bus and added wireguard vpn support except subnets * Added subnet cidr * Fixing default value bugs * added identifier * added identifier as variable and changed providers to access all flavors * reformatted * slurm * fixed ip assigning * foreign workers are now included in compute nodes * Added vpnwkrs to playbook start * Fixed formatting. Added identifier instead of "Test" for wireguard configuration to improve debugging * Larger rework of instances file * fixing bugs caused by aforementioned rework * fixing bugs caused by aforementioned rework * fixing bugs caused by aforementioned rework * fixing bugs caused by aforementioned rework * cluster_dict no longer needed for ansible configuration * Changed instances_yml so it allows grouping by cloud * Renamed to match jinja extension of other files * instances.master * instances.master * removed master from instances list and fixed minor bugs. * Fixed slicing * Removed empty vpnworkers list as there can be only one * Removed no longer needed import * minor reference fixes regarding master and vpn * Changed ip to cidr as it should be in nfs exports * removed faulty space in nfs export entry * added vpnwkrs to list of nodes to run ansible-playbook on * added missing vpnwkr * Set default partition * Removed default partition as this key doesn't exist * default if cloud fits * all credentials will now be stored. Not compatible with save script yet. * fixed wrong parameter type due to ac handling multiple providers now instead of just one * Fixed cidr bug * changed cloud_specification to use identifier * Fixed master not being filtered out due to buggy detection * create is now cloud structured but badly implemented (needs asynchronous implementation) * Removed master = none * removed faulty bracket. * Worker start follows cloud structure now * fixed badly placed assignment of ac_cloud_yaml * replaced no longer fitting regex by an actual exact check using slurm's hostname resolution * fixed old variable name leading to hickups * Changed nfs exports to add all subnets. Currently not very nice looking, but working. * Added comments and improved variable names. * Added delete_server.py routine and connected it to fail.sh (untested). * Further grouped code and simplified logging. * fixed minor bugs and added a little bit of logging. * patch for wait for post-launch services to stop * Added private_v4 to configuration implementation. Bit dirty. * Changed nfs for workers back to private_v4. Will crash with vpnwkr as long as security groups are not set correctly. * Added missing instances * add dnsmasq support ( #372 ) (#380) * add dnsmasq support ( #372 ) * extend dnsmasq support ( #372 ) * bugfixes dnsmasq support ( #372 ) * fix ansible syntax add all vpnworker to dnsmasq.hosts ( #372 ) change order of copying clouds.yaml many changes * Added wireguard_ip * wireguard_ip increased by 1 to ignore master * Added a print for private_v4 to symbolize the start of dns entry creation * Add support for additional vars file : hosts.yml Extend hosts.j2 template to support worker entries * - extends instances configuration - add worker_userdata template * - remove unused wireguard-worker.yml - add userdata support (create_server.py) - enable ip forwarding and tcp mtu probing on vpn gateways * Fix program crash when image is not active (#382) * Fixed function missing call * Fixed linter that wasn't troubled before * Fix ephemeral not working (#385) * implemented usage of host_vars * probably solved, but not best solution yet * changed from host_vars to group_vars to have fewer files doing the same work * update requirements.txt * add ConfigurationException * Provider and it implementation for Openstack gets another method to add allowed_addresses to an interface/port * Remove not longer functions/ code fragments. Add support for extended network configuration, when creating a multi-cloud cluster. * added hybrid cloud * updating check documentation * updating check documentation * updating check documentation * Removed artefact * Filled text beyond headings * Add security group support to provider and its implementing classes. * Update create action: - support for security groups - slightly restructuring * add wirguard network to list of allowed addresses * fix wrong usage of jinja templating * add usage of security groups when creating a worker * fix wireguard systemd network configuration * add firewall rules when running in a multi-cloud setup * add termination of created security groups fix a converning adding allowed addresses * fix "allowed addresses" when running with more than 2 providers * pin openstacksdk to an older version to avoid deprecation warnings. * Added host file solution for vpnwkrs. Moved wireguard to configuration. * Added host vars to deletion process and fixed vpnwkrs using group vars instead of host vars bug. * Fixing structural changes due to merge * Fixed vpn workers getting lost * fixed merge bug, improved data structure ansible/jinja * Removed another bug regarding passing too many arguments. * removed delay for now * fixed worker count * fixed wireguard * Added reattempt for ConflictException still not perfect. * Further fixed vpnwkr merge issues * Adapted command to new group vpn that contains both master and vpnwkr * Fixed wireguard ip bug * fixed bug wireguard not installed on vpn-worker * Changed "local" to "ssh" in order to avoid sudo right issue on master. * fixed group name? * adapted timeout to experiences * fixed group name now using "-" instead of ":" * fixed userdata being list cause of using readlines instead of read. Now is string. * group name cannot contain '-' therefore switched to underscores. Maybe change this in the node naming convention as well. * Make all clouds default * first draft add ip routes * Added ip routes to main.yml * Changed ip route registration to make use of linux network files * Workers now save the gateway_ip (private_v4 of master or vpnwkr). Also fixed a counting error. * now using common variable wireguard_common instead of group_var wireguard which is always missing on workers. * Added rights. * Disabling netplan and going full networkd * Disabling cloud network changes after initialization * Added netplan deactivation * Fixed connection issues * Added missing handler and added a task that updates the host file on worker * Fixed minor bad namings and added missing ".yaml" extension to task file * Added implementation of "bibiname" a short script that allows node name creation * fixed name issue regarding slurm user executing ansible. Now master name is determined without user involvement. * renamed task to "generate bibiname script" * Adapted scripts to meet hybrid cloud solution * Added delete_server.py script to bin copied files * fixed fail and terminate script * changed terminate script to timeout delete * fixed minor code issues * fixed linting issues delete_server.py * fixed linting issues provider.py * fixed linting issues startup_tests.py * fixed linting issues * fixed linting issues * fixed typo * fixed termination ConflictException not caught * Added basic structure for multi_cloud.md * Added elixir compute presentation as an additional light-weight read. * added this file that - in the future - maybe should hold information regarding other projects that are using BiBiGrid. That makes it easier to keep an eye on all applications that might be affected by BiBiGrid's changes. * Added basic wireguard.md documentation * fixed grammar * removed redundant warning * added dnsmasq documentation structure * removed encryption * updated purpose description * update DNS * now creating empty hosts.yml file in order to allow ansible execution * Remove entire vars folder * fixed path * changed provider.NAME provider.cloud_specification['identifier'] * Removed vpnwkr from slurm as it should only be used to establish connection and not for computing * Decoupled for loop worker ansible host creation from vpnwkr host creation * fixed vpnwkr still being added to the partition even though the node doesn't exist anymore * Fixed bug in bibiname.j2 that gave master a number (master never has a number as there is only one) * removed all references to the instances.master * removed further references to instances.yml and fixed bugs appearing because of it. Needs rework where master access can be shortened. * fixed slurm.conf creating NodeName duplicates. Still unordered. * Added all partition * Removed instances.yml from create_server.py * Removed instances.yml from delete_server.py * removed last remains of instance.yml * Servers are now created asynchronously. * Fixed rest error * Added support for feature in slurm.conf * Putting features into group_vars * Updated configuration.md documentation to mention new feature "feature" for instances and configuration. * Added merge information and updates bibigrid.yml accordingly * added features to master and workergroups * fixed features not added as string to slurm.conf * added missing empty line * Now a single string instead of a list of features is understood as well. * Improved cloud_identifier selection and documented the new way: picking clouds.yaml key. * updated configuration.md and removed many inaccuracies * changed instances to instance for instance creation as workers are no longer created. * Improved create.md * Improved naming of subparagraph * Fixed indentation, readability and documentation * Improved logging information. * Improved logging * Added warning message when configuration is not list. * added configuration list parameter * Added logging when network or subnet couldn't be set * Improved logging of ConfigurationExceptions * Improved documentation. Removed unnecessary variable in ide * Improved documentation. * Added brief information regarding wireguard and zabbix * changed vpnwkr to vpngtw * Fixed security group deletion for not multi-cloud clusters. --------- Co-authored-by: Jan Krüger Co-authored-by: Jan Krüger --- .gitignore | 7 +- bibigrid.yml | 8 +- bibigrid/core/actions/check.py | 1 + bibigrid/core/actions/create.py | 278 ++++++++++------- bibigrid/core/actions/ide.py | 5 +- bibigrid/core/actions/list_clusters.py | 17 +- bibigrid/core/actions/terminate_cluster.py | 86 ++++-- bibigrid/core/actions/version.py | 2 +- bibigrid/core/provider.py | 49 ++- bibigrid/core/startup.py | 6 +- bibigrid/core/utility/ansible_commands.py | 2 +- bibigrid/core/utility/ansible_configurator.py | 285 +++++++++++------- .../utility/handler/configuration_handler.py | 27 +- bibigrid/core/utility/handler/ssh_handler.py | 58 ++-- bibigrid/core/utility/id_generation.py | 4 +- .../utility/paths/ansible_resources_path.py | 23 +- .../core/utility/validate_configuration.py | 23 +- .../core/utility/wireguard/wireguard_keys.py | 29 ++ bibigrid/models/exceptions.py | 8 + bibigrid/openstack/openstack_provider.py | 136 ++++++--- .../markdown/bibigrid_feature_list.md | 1 + .../markdown/bibigrid_software_list.md | 2 + documentation/markdown/features/CLI.md | 17 +- .../features/bibigrid_ansible_playbook.md | 3 + documentation/markdown/features/check.md | 88 +++++- .../features/cloud_specification_data.md | 90 +++++- .../markdown/features/configuration.md | 224 +++++++++----- documentation/markdown/features/create.md | 87 +++++- documentation/markdown/features/ide.md | 5 + .../markdown/features/list_clusters.md | 6 +- .../markdown/features/multi_cloud.md | 53 ++++ .../markdown/features/terminate_cluster.md | 21 +- documentation/markdown/features/update.md | 6 +- documentation/markdown/features/version.md | 4 +- documentation/markdown/software/ansible.md | 23 +- documentation/markdown/software/dnsmasq.md | 4 + documentation/markdown/software/nfs.md | 34 ++- documentation/markdown/software/slurm.md | 20 +- documentation/markdown/software/theia_ide.md | 2 + documentation/markdown/software/wireguard.md | 5 + documentation/markdown/software/zabbix.md | 8 + ...Compute 2023 -- Multi-Cloud - BiBiGrid.pdf | Bin 0 -> 2026743 bytes documentation/used_by.md | 1 + .../roles/bibigrid/files/slurm/create.sh | 6 +- .../bibigrid/files/slurm/create_server.py | 262 ++++++++++------ .../bibigrid/files/slurm/delete_server.py | 68 +++++ .../roles/bibigrid/files/slurm/fail.sh | 11 + .../bibigrid/files/slurm/requirements.txt | 1 + .../roles/bibigrid/files/slurm/terminate.sh | 21 +- .../files/zabbix/zabbix_host_delete.py | 30 +- .../playbook/roles/bibigrid/handlers/main.yml | 21 ++ .../bibigrid/tasks/000-add-ip-routes.yml | 55 ++++ .../playbook/roles/bibigrid/tasks/001-apt.yml | 5 +- .../bibigrid/tasks/002-wireguard-vpn.yml | 41 +++ .../playbook/roles/bibigrid/tasks/003-dns.yml | 36 +++ .../roles/bibigrid/tasks/004-hosts.yml | 12 - .../roles/bibigrid/tasks/004-update-hosts.yml | 5 + .../roles/bibigrid/tasks/010-bin-server.yml | 6 + .../roles/bibigrid/tasks/020-disk-server.yml | 2 +- .../roles/bibigrid/tasks/020-disk-worker.yml | 2 +- .../roles/bibigrid/tasks/025-nfs-server.yml | 7 +- .../roles/bibigrid/tasks/025-nfs-worker.yml | 4 +- .../roles/bibigrid/tasks/042-slurm-server.yml | 23 ++ .../roles/bibigrid/tasks/999-theia.yml | 1 + .../playbook/roles/bibigrid/tasks/main.yml | 34 ++- .../roles/bibigrid/templates/bin/bibiname.j2 | 12 + .../bibigrid/templates/dns/dnsmasq.conf.j2 | 9 + .../roles/bibigrid/templates/dns/hosts.j2 | 12 + .../bibigrid/templates/dns/resolv.conf.j2 | 3 + .../networking/bibigrid_ens3.link.j2 | 13 + .../networking/bibigrid_ens3.network.j2 | 26 ++ .../roles/bibigrid/templates/slurm/slurm.conf | 35 ++- .../bibigrid/templates/slurm/slurmdbd.conf | 2 +- .../templates/slurm/worker_userdata.j2 | 10 + .../bibigrid/templates/wireguard/device.j2 | 24 ++ .../bibigrid/templates/wireguard/network.j2 | 16 + .../templates/zabbix/zabbix_agentd.conf.j2 | 4 +- resources/playbook/tools/tee.py | 65 ++-- .../{test_Provider.py => test_provider.py} | 104 ++++--- tests/{startupTests.py => startup_tests.py} | 27 +- ...urator.py => test_ansible_configurator.py} | 19 +- tests/test_listClusters.py | 10 +- 82 files changed, 2064 insertions(+), 738 deletions(-) create mode 100755 bibigrid/core/utility/wireguard/wireguard_keys.py create mode 100644 documentation/markdown/features/bibigrid_ansible_playbook.md create mode 100644 documentation/markdown/features/multi_cloud.md create mode 100644 documentation/markdown/software/dnsmasq.md create mode 100644 documentation/markdown/software/wireguard.md create mode 100644 documentation/pdfs/ELIXIR Compute 2023 -- Multi-Cloud - BiBiGrid.pdf create mode 100644 documentation/used_by.md create mode 100644 resources/playbook/roles/bibigrid/files/slurm/delete_server.py create mode 100644 resources/playbook/roles/bibigrid/tasks/000-add-ip-routes.yml create mode 100644 resources/playbook/roles/bibigrid/tasks/002-wireguard-vpn.yml create mode 100644 resources/playbook/roles/bibigrid/tasks/003-dns.yml delete mode 100644 resources/playbook/roles/bibigrid/tasks/004-hosts.yml create mode 100644 resources/playbook/roles/bibigrid/tasks/004-update-hosts.yml create mode 100644 resources/playbook/roles/bibigrid/templates/bin/bibiname.j2 create mode 100644 resources/playbook/roles/bibigrid/templates/dns/dnsmasq.conf.j2 create mode 100644 resources/playbook/roles/bibigrid/templates/dns/hosts.j2 create mode 100644 resources/playbook/roles/bibigrid/templates/dns/resolv.conf.j2 create mode 100644 resources/playbook/roles/bibigrid/templates/networking/bibigrid_ens3.link.j2 create mode 100644 resources/playbook/roles/bibigrid/templates/networking/bibigrid_ens3.network.j2 create mode 100644 resources/playbook/roles/bibigrid/templates/slurm/worker_userdata.j2 create mode 100644 resources/playbook/roles/bibigrid/templates/wireguard/device.j2 create mode 100644 resources/playbook/roles/bibigrid/templates/wireguard/network.j2 rename tests/provider/{test_Provider.py => test_provider.py} (72%) rename tests/{startupTests.py => startup_tests.py} (62%) rename tests/{test_ansibleConfigurator.py => test_ansible_configurator.py} (97%) diff --git a/.gitignore b/.gitignore index bea74e83..e8c3cc34 100644 --- a/.gitignore +++ b/.gitignore @@ -5,10 +5,9 @@ # variable resources resources/playbook/site.yml resources/playbook/ansible_hosts -resources/playbook/vars/instances.yml -resources/playbook/vars/login.yml -resources/playbook/vars/worker_specification.yml -resources/playbook/vars/common_configuration.yml +resources/playbook/vars/ +resources/playbook/host_vars/ +resources/playbook/group_vars/ # any log files *.log diff --git a/bibigrid.yml b/bibigrid.yml index 69f58907..aa99d30d 100644 --- a/bibigrid.yml +++ b/bibigrid.yml @@ -1,6 +1,6 @@ # See https://cloud.denbi.de/wiki/Tutorials/BiBiGrid/ (after update) - # First configuration will be used for general cluster information and must include the master. - # All other configurations mustn't include another master, but exactly one vpnWorker instead (keys like master). + # First configuration also holds general cluster information and must include the master. + # All other configurations mustn't include another master, but exactly one vpngtw instead (keys like master). - infrastructure: openstack # former mode. Describes what cloud provider is used (others are not implemented yet) cloud: openstack # name of clouds.yaml cloud-specification key (which is value to top level key clouds) @@ -53,6 +53,7 @@ masterInstance: type: # existing type/flavor on your cloud. See launch instance>flavor for options image: # existing image on your cloud. See https://openstack.cebitec.uni-bielefeld.de/project/images pick an active one. Currently only ubuntu22.04 is supported + # features: # list # -- END: GENERAL CLUSTER INFORMATION -- @@ -61,6 +62,7 @@ # - type: # existing type/flavor on your cloud. See launch instance>flavor for options # image: # same as master # count: # any number of workers you would like to create with set type, image combination + # # features: # list # Depends on cloud image sshUser: # for example ubuntu @@ -90,4 +92,6 @@ # Currently, the case in Berlin, DKFZ, Heidelberg and Tuebingen. #localDNSLookup: True + #features: # list + #- [next configurations] # KEY NOT IMPLEMENTED YET diff --git a/bibigrid/core/actions/check.py b/bibigrid/core/actions/check.py index 144eafbe..cd4a15a2 100644 --- a/bibigrid/core/actions/check.py +++ b/bibigrid/core/actions/check.py @@ -6,6 +6,7 @@ LOG = logging.getLogger("bibigrid") + def check(configurations, providers): """ Uses validate_configuration to validate given configuration. diff --git a/bibigrid/core/actions/create.py b/bibigrid/core/actions/create.py index 609f638b..71da041d 100644 --- a/bibigrid/core/actions/create.py +++ b/bibigrid/core/actions/create.py @@ -20,7 +20,7 @@ from bibigrid.core.utility.paths import bin_path as biRP from bibigrid.models import exceptions from bibigrid.models import return_threading -from bibigrid.models.exceptions import ExecutionException +from bibigrid.models.exceptions import ExecutionException, ConfigurationException PREFIX = "bibigrid" SEPARATOR = "-" @@ -30,22 +30,22 @@ def get_identifier(identifier, cluster_id, worker_group="", additional=""): """ - This method does more advanced string formatting to generate master, vpnwkr and worker names - @param identifier: master|vpnwkr|worker + This method does more advanced string formatting to generate master, vpngtw and worker names + @param identifier: master|vpngtw|worker @param cluster_id: id of cluster @param worker_group: group of worker (every member of a group has same flavor/type and image) @param additional: an additional string to be added at the end @return: the generated string """ general = PREFIX_WITH_SEP + identifier + str(worker_group) + SEPARATOR + cluster_id - if additional: + if additional or additional == 0: return general + SEPARATOR + str(additional) return general MASTER_IDENTIFIER = partial(get_identifier, identifier="master", additional="") WORKER_IDENTIFIER = partial(get_identifier, identifier="worker") -VPN_WORKER_IDENTIFIER = partial(get_identifier, identifier="vpnwkr") +VPN_WORKER_IDENTIFIER = partial(get_identifier, identifier="vpngtw") KEY_PREFIX = "tempKey_bibi" KEY_FOLDER = os.path.expanduser("~/.config/bibigrid/keys/") @@ -54,6 +54,8 @@ def get_identifier(identifier, cluster_id, worker_group="", additional=""): CLUSTER_MEMORY_FOLDER = KEY_FOLDER CLUSTER_MEMORY_FILE = ".bibigrid.mem" CLUSTER_MEMORY_PATH = os.path.join(CLUSTER_MEMORY_FOLDER, CLUSTER_MEMORY_FILE) +DEFAULT_SECURITY_GROUP_NAME = "default" + SEPARATOR + "{cluster_id}" +WIREGUARD_SECURITY_GROUP_NAME = "wireguard" + SEPARATOR + "{cluster_id}" class Create: # pylint: disable=too-many-instance-attributes,too-many-arguments @@ -68,7 +70,7 @@ def __init__(self, providers, configurations, config_path, debug=False): :param providers: List of providers (provider) :param configurations: List of configurations (dict) :param config_path: string that is the path to config-file - :param debug: Bool. If True Cluster will offer shut-down after create and + :param debug: Bool. If True Cluster offer shut-down after create and will ask before shutting down on errors """ self.providers = providers @@ -83,7 +85,11 @@ def __init__(self, providers, configurations, config_path, debug=False): LOG.debug("Cluster-ID: %s", self.cluster_id) self.name = AC_NAME.format(cluster_id=self.cluster_id) self.key_name = KEY_NAME.format(cluster_id=self.cluster_id) - self.instance_counter = 0 + self.default_security_group_name = DEFAULT_SECURITY_GROUP_NAME.format(cluster_id=self.cluster_id) + self.wireguard_security_group_name = WIREGUARD_SECURITY_GROUP_NAME.format(cluster_id=self.cluster_id) + + self.worker_counter = 0 + self.vpn_counter = 0 self.thread_lock = threading.Lock() self.use_master_with_public_ip = configurations[0].get("useMasterWithPublicIp", True) LOG.debug("Keyname: %s", self.key_name) @@ -92,7 +98,7 @@ def generate_keypair(self): """ Generates ECDSA Keypair using system-function ssh-keygen and uploads the generated public key to providers. generate_keypair makes use of the fact that files in tmp are automatically deleted - ToDo find a more pythonic way to create an ECDSA keypiar + ToDo find a more pythonic way to create an ECDSA keypair See here for why using python module ECDSA wasn't successful https://stackoverflow.com/questions/71194770/why-does-creating-ecdsa-keypairs-via-python-differ-from-ssh-keygen-t-ecdsa-and :return: @@ -115,84 +121,81 @@ def generate_keypair(self): with open(CLUSTER_MEMORY_PATH, mode="w+", encoding="UTF-8") as cluster_memory_file: yaml.safe_dump(data={"cluster_id": self.cluster_id}, stream=cluster_memory_file) - def start_instance(self, provider, identifier, instance_type, network, volumes=None, - external_network=None): + def generate_security_groups(self): """ - Starts any (master,worker,vpn) single server/instance in given network on given provider - with floating-ip if master or vpn and with volume if master. - :param provider: provider server will be started on - :param identifier: string MASTER/WORKER/VPN_IDENTIFIER - :param instance_type: dict from configuration containing server type, image and count (but count is not needed) - :param network: string network where server will be started in. - All server of a provider are started in the same network - :param volumes: list of volumes that are to be attached to the server. Currently only relevant for master - :param external_network: string only needed if worker=False to create floating_ip - :return: + Generate a security groups: + - default with basic rules for the cluster + - wireguard when more than one provider is used (= multicloud) """ - # potentially weird counting due to master - with self.thread_lock: - if identifier == MASTER_IDENTIFIER: # pylint: disable=comparison-with-callable - name = identifier(cluster_id=self.cluster_id) - elif identifier == WORKER_IDENTIFIER: # pylint: disable=comparison-with-callable - name = identifier(number=self.instance_counter, cluster_id=self.cluster_id) - # else: - # name = identifier(number=self.instance_counter, cluster_id=self.cluster_id) - self.instance_counter += 1 - LOG.info("Starting instance/server %s", name) - flavor = instance_type["type"] - image = instance_type["image"] - server = provider.create_server(name=name, flavor=flavor, key_name=self.key_name, - image=image, network=network, volumes=volumes) - floating_ip = None - # pylint: disable=comparison-with-callable - if identifier == VPN_WORKER_IDENTIFIER or ( - identifier == MASTER_IDENTIFIER and self.use_master_with_public_ip): - # wait seems to be included. Not in documentation - floating_ip = provider.attach_available_floating_ip(network=external_network, - server=server)["floating_ip_address"] - elif identifier == MASTER_IDENTIFIER: - floating_ip = provider.conn.get_server(server["id"])["private_v4"] - # pylint: enable=comparison-with-callable - return floating_ip + for provider, configuration in zip(self.providers, self.configurations): + # create a default security group + default_security_group_id = provider.create_security_group(name=self.default_security_group_name)["id"] + + rules = [{"direction": "ingress", "ethertype": "IPv4", "protocol": None, "port_range_min": None, + "port_range_max": None, "remote_ip_prefix": None, "remote_group_id": default_security_group_id}, + {"direction": "ingress", "ethertype": "IPv4", "protocol": "tcp", "port_range_min": 22, + "port_range_max": 22, "remote_ip_prefix": "0.0.0.0/0", "remote_group_id": None}] + # when running a multi-cloud setup additional default rules are necessary + if len(self.providers) > 1: + # allow incoming traffic from wireguard network + rules.append({"direction": "ingress", "ethertype": "IPv4", "protocol": "tcp", "port_range_min": None, + "port_range_max": None, "remote_ip_prefix": "10.0.0.0/24", "remote_group_id": None}) + # allow incoming traffic from all other local provider networks + for tmp_configuration in self.configurations: + if tmp_configuration != configuration: + rules.append( + {"direction": "ingress", "ethertype": "IPv4", "protocol": "tcp", "port_range_min": None, + "port_range_max": None, "remote_ip_prefix": tmp_configuration['subnet_cidrs'], + "remote_group_id": None}) + provider.append_rules_to_security_group(default_security_group_id, rules) + configuration["security_groups"] = [self.default_security_group_name] # store in configuration + # when running a multi-cloud setup create an additional wireguard group + if len(self.providers) > 1: + _ = provider.create_security_group(name=self.wireguard_security_group_name)["id"] + configuration["security_groups"].append(self.wireguard_security_group_name) # store in configuration - def start_instances(self, configuration, provider): + def start_vpn_or_master_instance(self, configuration, provider): """ - Starts all instances of a provider using multithreading + Start master/vpn-worker of a provider :param configuration: dict configuration of said provider :param provider: provider :return: """ - LOG.info("Starting instances on %s", provider.NAME) - # threads = [] identifier, instance_type, volumes = self.prepare_vpn_or_master_args(configuration, provider) external_network = provider.get_external_network(configuration["network"]) + with self.thread_lock: + if identifier == MASTER_IDENTIFIER: # pylint: disable=comparison-with-callable + name = identifier(cluster_id=self.cluster_id) + else: + name = identifier(cluster_id=self.cluster_id, # pylint: disable=redundant-keyword-arg + additional=self.vpn_counter) # pylint: disable=redundant-keyword-arg + self.vpn_counter += 1 + LOG.info(f"Starting instance/server {name} on {provider.cloud_specification['identifier']}") + flavor = instance_type["type"] + image = instance_type["image"] + network = configuration["network"] + + # create a server and block until it is up and running + server = provider.create_server(name=name, flavor=flavor, key_name=self.key_name, image=image, network=network, + volumes=volumes, security_groups=configuration["security_groups"], wait=True) + configuration["private_v4"] = server["private_v4"] + + # get mac address for given private address + # Attention: The following source code works with Openstack and IPV4 only + configuration["mac_addr"] = None + for elem in server['addresses']: + for network in server['addresses'][elem]: + if network['addr'] == configuration["private_v4"]: + configuration["mac_addr"] = network['OS-EXT-IPS-MAC:mac_addr'] + if configuration["mac_addr"] is None: + raise ConfigurationException(f"MAC address for ip {configuration['private_v4']} not found.") - # Starts master/vpn. Uses return threading to get floating_ip of master/vpn - vpn_or_master_thread = return_threading.ReturnThread(target=self.start_instance, - args=[provider, - identifier, - instance_type, - configuration["network"], - volumes, - external_network]) - vpn_or_master_thread.start() - - # Starts all workers - # for worker_instance_type in configuration.get("workerInstances") or []: - # for worker in range(worker_instance_type["count"]): - # worker_thread = threading.Thread(target=self.start_instance, - # args=[provider, - # WORKER_IDENTIFIER, - # worker_instance_type, - # configuration["network"], - # True]) - # worker_thread.start() - # threads.append(worker_thread) - LOG.info("Waiting for servers to start-up on cloud %s", provider.cloud_specification['identifier']) - vpn_or_m_floating_ip_address = vpn_or_master_thread.join() - self.setup_reachable_servers(configuration, vpn_or_m_floating_ip_address) - # for thread in threads: - # thread.join() + # pylint: disable=comparison-with-callable + if identifier == VPN_WORKER_IDENTIFIER or (identifier == MASTER_IDENTIFIER and self.use_master_with_public_ip): + configuration["floating_ip"] = \ + provider.attach_available_floating_ip(network=external_network, server=server)["floating_ip_address"] + elif identifier == MASTER_IDENTIFIER: + configuration["floating_ip"] = server["private_v4"] # pylint: enable=comparison-with-callable def prepare_vpn_or_master_args(self, configuration, provider): """ @@ -211,27 +214,24 @@ def prepare_vpn_or_master_args(self, configuration, provider): identifier = VPN_WORKER_IDENTIFIER volumes = [] # only master has volumes else: - LOG.warning("Configuration %s has no vpnwkr or master and is therefore unreachable.", configuration) + LOG.warning("Configuration %s has no vpngtw or master and is therefore unreachable.", configuration) raise KeyError return identifier, instance_type, volumes - def setup_reachable_servers(self, configuration, vpn_or_m_floating_ip_address): + def initialize_instances(self): """ - Executes necessary commands on master or vpnwkr - :param configuration: said configuration - :param vpn_or_m_floating_ip_address: floating_ip to master or vpnwkr + Setup all servers """ - if configuration.get("masterInstance"): - self.master_ip = vpn_or_m_floating_ip_address - ssh_handler.ansible_preparation(floating_ip=vpn_or_m_floating_ip_address, - private_key=KEY_FOLDER + self.key_name, - username=self.ssh_user, - commands=self.ssh_add_public_key_commands) - elif configuration.get("vpnInstance"): - ssh_handler.execute_ssh(floating_ip=self.master_ip, - private_key=KEY_FOLDER + self.key_name, - username=self.ssh_user, - commands=ssh_handler.VPN_SETUP) + for configuration in self.configurations: + if configuration.get("masterInstance"): + self.master_ip = configuration["floating_ip"] + ssh_handler.ansible_preparation(floating_ip=configuration["floating_ip"], + private_key=KEY_FOLDER + self.key_name, username=self.ssh_user, + commands=self.ssh_add_public_key_commands) + elif configuration.get("vpnInstance"): + ssh_handler.execute_ssh(floating_ip=configuration["floating_ip"], + private_key=KEY_FOLDER + self.key_name, username=self.ssh_user, + commands=ssh_handler.VPN_SETUP) def prepare_volumes(self, provider, mounts): """ @@ -266,10 +266,19 @@ def prepare_configurations(self): :return: """ for configuration, provider in zip(self.configurations, self.providers): + configuration["cloud_identifier"] = provider.cloud_specification["identifier"] if not configuration.get("network"): configuration["network"] = provider.get_network_id_by_subnet(configuration["subnet"]) + if not configuration["network"]: + LOG.warning("Unable to set network. " + f"Subnet doesn't exist in cloud {configuration['cloud_identifier']}") + raise ConfigurationException(f"Subnet doesn't exist in cloud {configuration['cloud_identifier']}") elif not configuration.get("subnet"): configuration["subnet"] = provider.get_subnet_ids_by_network(configuration["network"]) + if not configuration["subnet"]: + LOG.warning("Unable to set subnet. Network doesn't exist.") + raise ConfigurationException("Network doesn't exist.") + configuration["subnet_cidrs"] = provider.get_subnet_by_id_or_name(configuration["subnet"])["cidr"] configuration["sshUser"] = self.ssh_user # is used in ansibleConfigurator def upload_data(self): @@ -277,44 +286,81 @@ def upload_data(self): Configures ansible and then uploads the modified files and all necessary data to the master :return: """ - if not os.path.isdir(aRP.VARS_FOLDER): - LOG.info("%s not found. Creating folder.", aRP.VARS_FOLDER) - os.mkdir(aRP.VARS_FOLDER) - ansible_configurator.configure_ansible_yaml(providers=self.providers, - configurations=self.configurations, + for folder in [aRP.VARS_FOLDER, aRP.GROUP_VARS_FOLDER, aRP.HOST_VARS_FOLDER]: + if not os.path.isdir(folder): + LOG.info("%s not found. Creating folder.", folder) + os.mkdir(folder) + if not os.path.isfile(aRP.HOSTS_FILE): + with open(aRP.HOSTS_FILE, 'a', encoding='utf-8') as hosts_file: + hosts_file.write("# placeholder file for worker DNS entries (see 003-dns)") + + ansible_configurator.configure_ansible_yaml(providers=self.providers, configurations=self.configurations, cluster_id=self.cluster_id) ssh_handler.execute_ssh(floating_ip=self.master_ip, private_key=KEY_FOLDER + self.key_name, username=self.ssh_user, filepaths=[(aRP.PLAYBOOK_PATH, aRP.PLAYBOOK_PATH_REMOTE), (biRP.BIN_PATH, biRP.BIN_PATH_REMOTE)], - commands=ssh_handler.ANSIBLE_START + - [ssh_handler.get_ac_command(self.providers[0], AC_NAME.format( - cluster_id=self.cluster_id))]) + commands=[ + ssh_handler.get_ac_command( + self.providers, + AC_NAME.format( + cluster_id=self.cluster_id))] + ssh_handler.ANSIBLE_START) - def start_start_instances_threads(self): + def start_start_instance_threads(self): """ Starts for each provider a start_instances thread and joins them. :return: """ - start_instances_threads = [] + start_instance_threads = [] for configuration, provider in zip(self.configurations, self.providers): - start_instances_thread = return_threading.ReturnThread(target=self.start_instances, - args=[configuration, provider]) - start_instances_thread.start() - start_instances_threads.append(start_instances_thread) - for start_instance_thread in start_instances_threads: + start_instance_thread = return_threading.ReturnThread(target=self.start_vpn_or_master_instance, + args=[configuration, provider]) + start_instance_thread.start() + start_instance_threads.append(start_instance_thread) + for start_instance_thread in start_instance_threads: start_instance_thread.join() - def create(self): + def extended_network_configuration(self): + """ + Configure master/vpn-worker network for a multi/hybrid cloud + :return: + """ + if len(self.providers) == 1: + return + + for provider_a, configuration_a in zip(self.providers, self.configurations): + # configure wireguard network as allowed network + allowed_addresses = [{'ip_address': '10.0.0.0/24', 'mac_address': configuration_a["mac_addr"]}] + # iterate over all configurations ... + for configuration_b in self.configurations: + # ... and pick all other configuration + if configuration_a != configuration_b: + LOG.info(f"{configuration_a['private_v4']} --> allowed_address_pair({configuration_a['mac_addr']}," + f"{configuration_b['subnet_cidrs']})") + # add provider_b network as allowed network + allowed_addresses.append( + {'ip_address': configuration_b["subnet_cidrs"], 'mac_address': configuration_a["mac_addr"]}) + # configure security group rules + provider_a.append_rules_to_security_group(self.wireguard_security_group_name, [ + {"direction": "ingress", "ethertype": "IPv4", "protocol": "udp", "port_range_min": 51820, + "port_range_max": 51820, "remote_ip_prefix": configuration_b["floating_ip"], + "remote_group_id": None}]) + # configure allowed addresses for provider_a/configuration_a + provider_a.set_allowed_addresses(configuration_a['private_v4'], allowed_addresses) + + def create(self): # pylint: disable=too-many-branches,too-many-statements """ Creates cluster and prints helpful cluster-info afterwards. If debug is set True it offers termination after starting the cluster. :return: exit_state """ - self.generate_keypair() try: + self.generate_keypair() self.prepare_configurations() - self.start_start_instances_threads() + self.generate_security_groups() + self.start_start_instance_threads() + self.extended_network_configuration() + self.initialize_instances() self.upload_data() self.print_cluster_start_info() if self.debug: @@ -322,14 +368,24 @@ def create(self): terminate_cluster.terminate_cluster(cluster_id=self.cluster_id, providers=self.providers, debug=self.debug) except exceptions.ConnectionException: + if self.debug: + LOG.error(traceback.format_exc()) LOG.error("Connection couldn't be established. Check Provider connection.") except paramiko.ssh_exception.NoValidConnectionsError: + if self.debug: + LOG.error(traceback.format_exc()) LOG.error("SSH connection couldn't be established. Check keypair.") except KeyError as exc: + if self.debug: + LOG.error(traceback.format_exc()) LOG.error(f"Tried to access dictionary key {str(exc)}, but couldn't. Please check your configurations.") except FileNotFoundError as exc: + if self.debug: + LOG.error(traceback.format_exc()) LOG.error(f"Tried to access resource files but couldn't. No such file or directory: {str(exc)}") except TimeoutError as exc: + if self.debug: + LOG.error(traceback.format_exc()) LOG.error(f"Timeout while connecting to master. Maybe you are trying to create a master without " f"public ip " f"while not being in the same network: {str(exc)}") @@ -337,6 +393,10 @@ def create(self): if self.debug: LOG.error(traceback.format_exc()) LOG.error(f"Execution of cmd on remote host fails: {str(exc)}") + except ConfigurationException as exc: + if self.debug: + LOG.error(traceback.format_exc()) + LOG.error(f"Configuration invalid: {str(exc)}") except Exception as exc: # pylint: disable=broad-except if self.debug: LOG.error(traceback.format_exc()) diff --git a/bibigrid/core/actions/ide.py b/bibigrid/core/actions/ide.py index 16f42bb2..32c20aa9 100644 --- a/bibigrid/core/actions/ide.py +++ b/bibigrid/core/actions/ide.py @@ -15,13 +15,14 @@ from bibigrid.core.utility.handler import cluster_ssh_handler DEFAULT_IDE_WORKSPACE = "${HOME}" -REMOTE_BIND_ADDRESS = 8181 DEFAULT_IDE_PORT_END = 8383 +REMOTE_BIND_ADDRESS = 8181 LOCAL_BIND_ADDRESS = 9191 MAX_JUMP = 100 LOCALHOST = "127.0.0.1" LOG = logging.getLogger("bibigrid") + def sigint_handler(caught_signal, frame): # pylint: disable=unused-argument """ Is called when SIGINT is thrown and terminates the program @@ -31,6 +32,8 @@ def sigint_handler(caught_signal, frame): # pylint: disable=unused-argument """ print("Exiting...") sys.exit(0) + + signal.signal(signal.SIGINT, sigint_handler) diff --git a/bibigrid/core/actions/list_clusters.py b/bibigrid/core/actions/list_clusters.py index 07e0173f..45e341a4 100644 --- a/bibigrid/core/actions/list_clusters.py +++ b/bibigrid/core/actions/list_clusters.py @@ -9,7 +9,7 @@ from bibigrid.core.actions import create -SERVER_REGEX = re.compile(r"^bibigrid-((master)-([a-zA-Z0-9]+)|(worker|vpnwkr)\d+-([a-zA-Z0-9]+)-\d+)$") +SERVER_REGEX = re.compile(r"^bibigrid-((master)-([a-zA-Z0-9]+)|(worker|vpngtw)\d+-([a-zA-Z0-9]+)-\d+)$") LOG = logging.getLogger("bibigrid") def dict_clusters(providers): @@ -48,7 +48,7 @@ def setup(cluster_dict, cluster_id, server, provider): if not cluster_dict.get(cluster_id): cluster_dict[cluster_id] = {} cluster_dict[cluster_id]["workers"] = [] - cluster_dict[cluster_id]["vpnwkrs"] = [] + cluster_dict[cluster_id]["vpngtws"] = [] server["provider"] = provider.NAME server["cloud_specification"] = provider.cloud_specification["identifier"] @@ -66,7 +66,7 @@ def print_list_clusters(cluster_id, providers): if cluster_dict.get(cluster_id): LOG.info("Printing specific cluster_dictionary") master_count, worker_count, vpn_count = get_size_overview(cluster_dict[cluster_id]) - print(f"\tCluster has {master_count} master, {vpn_count} vpnwkr and {worker_count} regular workers. " + print(f"\tCluster has {master_count} master, {vpn_count} vpngtw and {worker_count} regular workers. " f"The cluster is spread over {vpn_count + master_count} reachable provider(s).") pprint.pprint(cluster_dict[cluster_id]) else: @@ -90,7 +90,7 @@ def print_list_clusters(cluster_id, providers): else: LOG.warning("No master for cluster: %s.", cluster_key_id) master_count, worker_count, vpn_count = get_size_overview(cluster_node_dict) - print(f"\tCluster has {master_count} master, {vpn_count} vpnwkr and {worker_count} regular workers. " + print(f"\tCluster has {master_count} master, {vpn_count} vpngtw and {worker_count} regular workers. " f"The cluster is spread over {vpn_count + master_count} reachable provider(s).") else: print("No cluster found.") @@ -105,7 +105,7 @@ def get_size_overview(cluster_dict): LOG.info("Printing size overview") master_count = int(bool(cluster_dict.get("master"))) worker_count = len(cluster_dict.get("workers") or "") - vpn_count = len(cluster_dict.get("vpnwkrs") or "") + vpn_count = len(cluster_dict.get("vpngtws") or "") return master_count, worker_count, vpn_count @@ -117,7 +117,7 @@ def get_networks(cluster_dict): """ master = cluster_dict["master"] addresses = [{master["provider"]: list(master["addresses"].keys())}] - for server in (cluster_dict.get("vpnwkrs") or []): + for server in (cluster_dict.get("vpngtws") or []): addresses.append({server["provider"]: list(server["addresses"].keys())}) return addresses @@ -130,7 +130,7 @@ def get_security_groups(cluster_dict): """ master = cluster_dict["master"] security_groups = [{master["provider"]: master["security_groups"]}] - for server in (cluster_dict.get("vpnwkrs") or []): + for server in (cluster_dict.get("vpngtws") or []): security_groups.append({server["provider"]: server["security_groups"]}) return security_groups @@ -148,5 +148,6 @@ def get_master_access_ip(cluster_id, master_provider): master = create.MASTER_IDENTIFIER(cluster_id=cluster_id) if server["name"].startswith(master): return server.get("public_v4") or server.get("public_v6") or server.get("private_v4") - LOG.warning("Cluster %s not found on master_provider %s.", cluster_id, master_provider) + LOG.warning("Cluster %s not found on master_provider %s.", cluster_id, + master_provider.cloud_specification["identifier"]) return None diff --git a/bibigrid/core/actions/terminate_cluster.py b/bibigrid/core/actions/terminate_cluster.py index fbd5bcdd..a726738d 100644 --- a/bibigrid/core/actions/terminate_cluster.py +++ b/bibigrid/core/actions/terminate_cluster.py @@ -6,10 +6,14 @@ import logging import os import re +import time from bibigrid.core.actions import create +from bibigrid.models.exceptions import ConflictException + LOG = logging.getLogger("bibigrid") + def terminate_cluster(cluster_id, providers, debug=False): """ Goes through all providers and gets info of all servers which name contains cluster ID. @@ -23,8 +27,12 @@ def terminate_cluster(cluster_id, providers, debug=False): if not input(f"DEBUG MODE: Any non-empty input to shutdown cluster {cluster_id}. " "Empty input to exit with cluster still alive:"): return 0 + security_groups = [create.DEFAULT_SECURITY_GROUP_NAME] + if len(providers) > 1: + security_groups.append(create.WIREGUARD_SECURITY_GROUP_NAME) cluster_server_state = [] cluster_keypair_state = [] + cluster_security_group_state = [] tmp_keyname = create.KEY_NAME.format(cluster_id=cluster_id) local_keypairs_deleted = delete_local_keypairs(tmp_keyname) if local_keypairs_deleted or input(f"WARNING: No local temporary keyfiles found for cluster {cluster_id}. " @@ -32,13 +40,14 @@ def terminate_cluster(cluster_id, providers, debug=False): f"Any non-empty input to shutdown cluster {cluster_id}. " f"Empty input to exit with cluster still alive:"): for provider in providers: - LOG.info("Terminating cluster %s on on cloud %s", - cluster_id, provider.cloud_specification['identifier']) + LOG.info("Terminating cluster %s on cloud %s", cluster_id, provider.cloud_specification['identifier']) server_list = provider.list_servers() cluster_server_state += terminate_servers(server_list, cluster_id, provider) cluster_keypair_state.append(delete_keypairs(provider, tmp_keyname)) + cluster_keypair_state.append(delete_security_groups(provider, cluster_id, security_groups)) ac_state = delete_application_credentials(providers[0], cluster_id) - terminate_output(cluster_server_state, cluster_keypair_state, ac_state, cluster_id) + terminate_output(cluster_server_state, cluster_keypair_state, cluster_security_group_state, ac_state, + cluster_id) return 0 @@ -53,11 +62,11 @@ def terminate_servers(server_list, cluster_id, provider): LOG.info("Deleting servers on provider %s...", provider.cloud_specification['identifier']) cluster_server_state = [] # ^(master-{cluster_id}|worker-{cluster_id}|worker-[0-9]+-[0-9]+-{cluster_id})$ - server_regex = re.compile(fr"^bibigrid-(master-{cluster_id}+|(worker|vpnwkr)\d+-{cluster_id}+-\d+)$") + server_regex = re.compile(fr"^bibigrid-(master-{cluster_id}+|(worker\d+|vpngtw)-{cluster_id}+-\d+)$") for server in server_list: if server_regex.match(server["name"]): - LOG.info("Trying to terminate Server %s on cloud %s.", - server['name'], provider.cloud_specification['identifier']) + LOG.info("Trying to terminate Server %s on cloud %s.", server['name'], + provider.cloud_specification['identifier']) cluster_server_state.append(terminate_server(provider, server)) return cluster_server_state @@ -71,11 +80,10 @@ def terminate_server(provider, server): """ terminated = provider.delete_server(server["id"]) if not terminated: - LOG.warning("Unable to terminate server %s on provider %s.", - server['name'], provider.cloud_specification['identifier']) + LOG.warning("Unable to terminate server %s on provider %s.", server['name'], + provider.cloud_specification['identifier']) else: - LOG.info("Server %s terminated on provider %s.", - server['name'], provider.cloud_specification['identifier']) + LOG.info("Server %s terminated on provider %s.", server['name'], provider.cloud_specification['identifier']) return terminated @@ -118,6 +126,42 @@ def delete_local_keypairs(tmp_keyname): return success +def delete_security_groups(provider, cluster_id, security_groups, timeout=5): + """ + Delete configured security groups from provider. + + @param provider: current cloud provider + @param cluster_id: cluster id + @param timeout: how often should delete be attempted + @param has_wireguard: whether wireguard security group has been used + @return: True if all configured security groups can be deleted, false otherwise + """ + LOG.info("Deleting security groups on provider %s...", provider.cloud_specification['identifier']) + success = True + for security_group_format in security_groups: + security_group_name = security_group_format.format(cluster_id=cluster_id) + attempts = 0 + tmp_success = False + while not tmp_success: + try: + tmp_success = provider.delete_security_group(security_group_name) + except ConflictException: + tmp_success = False + if not tmp_success: + if attempts < timeout: + attempts += 1 + time.sleep(1+2 ** attempts) + LOG.info(f"Retrying to delete security group {security_group_name} on " + f"{provider.cloud_specification['identifier']}. Attempt {attempts}/{timeout}") + else: + LOG.error(f"Attempt to delete security group {security_group_name} on " + f"{provider.cloud_specification['identifier']} failed.") + break + LOG.info(f"Delete security_group {security_group_name} -> {tmp_success}") + success = success and tmp_success + return success + + def delete_application_credentials(master_provider, cluster_id): """ Deletes application credentials from the master_provider @@ -130,15 +174,16 @@ def delete_application_credentials(master_provider, cluster_id): if not auth.get("application_credential_id") or not auth.get("application_credential_secret"): return master_provider.delete_application_credential_by_id_or_name(create.AC_NAME.format(cluster_id=cluster_id)) LOG.info("Because you used application credentials to authenticate, " - "no created application credentials need deletion.") + "no created application credentials need deletion.") return True -def terminate_output(cluster_server_state, cluster_keypair_state, ac_state, cluster_id): +def terminate_output(cluster_server_state, cluster_keypair_state, cluster_security_group_state, ac_state, cluster_id): """ Logs the termination result in detail @param cluster_server_state: list of bools. Each bool stands for a server termination @param cluster_keypair_state: list of bools. Each bool stands for a keypair deletion + @param cluster_security_group_state: list of bools. Each bool stands for a security group deletion @param ac_state: bool that stands for the deletion of the credentials on the master @param cluster_id: @return: @@ -146,6 +191,7 @@ def terminate_output(cluster_server_state, cluster_keypair_state, ac_state, clus cluster_existed = bool(cluster_server_state) cluster_server_terminated = all(cluster_server_state) cluster_keypair_deleted = all(cluster_keypair_state) + cluster_security_group_deleted = all(cluster_security_group_state) if cluster_existed: if cluster_server_terminated: LOG.info("Terminated all servers of cluster %s.", cluster_id) @@ -155,19 +201,25 @@ def terminate_output(cluster_server_state, cluster_keypair_state, ac_state, clus LOG.info("Deleted all keypairs of cluster %s.", cluster_id) else: LOG.warning("Unable to delete all keypairs of cluster %s.", cluster_id) - if cluster_server_terminated and cluster_keypair_deleted: + if cluster_keypair_deleted: + LOG.info("Deleted all security groups of cluster %s.", cluster_id) + else: + LOG.warning("Unable to delete all security groups of cluster %s.", cluster_id) + + if cluster_server_terminated and cluster_keypair_deleted and cluster_security_group_deleted: out = f"Successfully terminated cluster {cluster_id}." LOG.info(out) print(out) else: LOG.warning("Unable to terminate cluster %s properly." - "\nAll servers terminated: %s\nAll keys deleted: %s", - cluster_id, cluster_server_terminated, cluster_keypair_deleted) + "\nAll servers terminated: %s" + "\nAll keys deleted: %s" + "\nAll security groups deleted: %s", cluster_id, cluster_server_terminated, + cluster_keypair_deleted, cluster_security_group_deleted) if ac_state: LOG.info("Successfully handled application credential of cluster %s.", cluster_id) else: LOG.warning("Unable to delete application credential of cluster %s", cluster_id) else: LOG.warning("Unable to find any servers for cluster-id %s. " - "Check cluster-id and configuration.\nAll keys deleted: %s", - cluster_id, cluster_keypair_deleted) + "Check cluster-id and configuration.\nAll keys deleted: %s", cluster_id, cluster_keypair_deleted) diff --git a/bibigrid/core/actions/version.py b/bibigrid/core/actions/version.py index 0ddbdb45..357f1dce 100644 --- a/bibigrid/core/actions/version.py +++ b/bibigrid/core/actions/version.py @@ -3,4 +3,4 @@ https://www.akeeba.com/how-do-version-numbers-work.html """ -__version__ = "0.2.0" +__version__ = "0.3.0" diff --git a/bibigrid/core/provider.py b/bibigrid/core/provider.py index 1c50c8bb..04af597e 100644 --- a/bibigrid/core/provider.py +++ b/bibigrid/core/provider.py @@ -3,7 +3,7 @@ """ -class Provider: # pylint: disable=too-many-public-methods +class Provider: # pylint: disable=too-many-public-methods """ See in detailed return value information in tests>provider>test_Provider. Make sure to register your newly implemented provider in provider_handler: name:class @@ -21,8 +21,7 @@ def __init__(self, cloud_specification): Call necessary methods to create a connection and save cloud_specification data as needed. """ self.cloud_specification = cloud_specification # contains sensitive information! - self.cloud_specification["identifier"] = self.cloud_specification.get('profile') or self.cloud_specification[ - 'auth'].get('project_id') or self.cloud_specification["auth"].get('application_credential_id') or "Unknown" + self.cloud_specification["identifier"] = self.cloud_specification['identifier'] def create_application_credential(self, name=None): """ @@ -80,7 +79,8 @@ def list_servers(self): :return: said list of servers or empty list if none found """ - def create_server(self, name, flavor, image, network, key_name=None, wait=True, volumes=None): # pylint: disable=too-many-arguments + def create_server(self, name, flavor, image, network, key_name=None, wait=True, + volumes=None, security_groups=None): # pylint: disable=too-many-arguments """ Creates a new server and waits for it to be accessible if wait=True. If volumes are given, they are attached. Returns said server (dict) @@ -91,6 +91,7 @@ def create_server(self, name, flavor, image, network, key_name=None, wait=True, :param key_name: (str) :param wait: (bool) :param volumes: List of volumes (list (str)) + :param security_groups: List of security_groups list (str) :return: server (dict) """ @@ -203,8 +204,48 @@ def get_flavors(self): """ def get_active_images(self): + """ + Return a list of active images. + :return: A list of active images. + """ return [image["name"] for image in self.get_images() if image["status"].lower() == "active"] def get_active_flavors(self): return [flavor["name"] for flavor in self.get_flavors() if "legacy" not in flavor["name"].lower() and "deprecated" not in flavor["name"].lower()] + + def set_allowed_addresses(self, id_or_ip, allowed_address_pairs): + """ + Set allowed address (or CIDR) for the given network interface/port + :param id_or_ip: id or ipv4 ip-address of the port/interface + :param allowed_address_pairs: a list of allowed address pairs. For example: + [{ + "ip_address": "23.23.23.1", + "mac_address": "fa:16:3e:c4:cd:3f" + }] + :return: + """ + + def create_security_group(self, name, rules): + """ + Create a security group and add given rules + :param name: Name of the security group to be created + :param rules: List of firewall rules to be added + :return: id of created security group + """ + + def delete_security_group(self, name_or_id): + """ + Delete a security group + :param name_or_id : Name or Id of the security group to be deleted + :return: True if delete succeeded, False otherwise. + + """ + + def append_rules_to_security_group(self, name_or_id, rules): + """ + Append firewall rules to given security group + :param name_or_id: + :param rules: + :return: + """ diff --git a/bibigrid/core/startup.py b/bibigrid/core/startup.py index 69b9c520..2e2e208a 100755 --- a/bibigrid/core/startup.py +++ b/bibigrid/core/startup.py @@ -96,7 +96,9 @@ def run_action(args, configurations, config_path): # pylint: disable=too-many-n if args.cluster_id: if args.terminate_cluster: LOG.info("Action terminate_cluster selected") - exit_state = terminate_cluster.terminate_cluster(args.cluster_id, providers, args.debug) + exit_state = terminate_cluster.terminate_cluster(cluster_id=args.cluster_id, + providers=providers, + debug=args.debug) elif args.ide: LOG.info("Action ide selected") exit_state = ide.ide(args.cluster_id, providers[0], configurations[0]) @@ -116,7 +118,7 @@ def run_action(args, configurations, config_path): # pylint: disable=too-many-n LOG.error(err) exit_state = 2 time_in_s = time.time() - start_time - print(f"--- {math.floor(time_in_s / 60)} minutes and {time_in_s % 60} seconds ---") + print(f"--- {math.floor(time_in_s / 60)} minutes and {round(time_in_s % 60, 2)} seconds ---") return exit_state diff --git a/bibigrid/core/utility/ansible_commands.py b/bibigrid/core/utility/ansible_commands.py index 3fdc7a94..fc6c2815 100644 --- a/bibigrid/core/utility/ansible_commands.py +++ b/bibigrid/core/utility/ansible_commands.py @@ -50,7 +50,7 @@ MV_ANSIBLE_CONFIG = ( "sudo install -D /opt/playbook/ansible.cfg /etc/ansible/ansible.cfg", "Move ansible configuration.") EXECUTE = (f"ansible-playbook {os.path.join(aRP.PLAYBOOK_PATH_REMOTE, aRP.SITE_YML)} -i " - f"{os.path.join(aRP.PLAYBOOK_PATH_REMOTE, aRP.ANSIBLE_HOSTS)} -l master", + f"{os.path.join(aRP.PLAYBOOK_PATH_REMOTE, aRP.ANSIBLE_HOSTS)} -l vpn", "Execute ansible playbook. Be patient.") # ansible setup diff --git a/bibigrid/core/utility/ansible_configurator.py b/bibigrid/core/utility/ansible_configurator.py index a853fbcd..a467a5f9 100644 --- a/bibigrid/core/utility/ansible_configurator.py +++ b/bibigrid/core/utility/ansible_configurator.py @@ -3,24 +3,26 @@ """ import logging +import os import mergedeep import yaml from bibigrid.core.actions import create from bibigrid.core.actions import ide -from bibigrid.core.actions import list_clusters -from bibigrid.core.utility.handler import configuration_handler from bibigrid.core.utility import id_generation -from bibigrid.core.utility.paths import ansible_resources_path as aRP from bibigrid.core.utility import yaml_dumper +from bibigrid.core.utility.handler import configuration_handler +from bibigrid.core.utility.paths import ansible_resources_path as aRP +from bibigrid.core.utility.wireguard import wireguard_keys DEFAULT_NFS_SHARES = ["/vol/spool"] ADDITIONAL_PATH = "additional/" PYTHON_INTERPRETER = "/usr/bin/python3" +vpngtw_ROLES = [{"role": "bibigrid", "tags": ["bibigrid", "bibigrid-vpngtw"]}] MASTER_ROLES = [{"role": "bibigrid", "tags": ["bibigrid", "bibigrid-master"]}] WORKER_ROLES = [{"role": "bibigrid", "tags": ["bibigrid", "bibigrid-worker"]}] -VARS_FILES = [aRP.INSTANCES_YML, aRP.CONFIG_YML] +VARS_FILES = [aRP.CONFIG_YML, aRP.HOSTS_YML] IDE_CONF = {"ide": False, "workspace": ide.DEFAULT_IDE_WORKSPACE, "port_start": ide.REMOTE_BIND_ADDRESS, "port_end": ide.DEFAULT_IDE_PORT_END, "build": False} ZABBIX_CONF = {"db": "zabbix", "db_user": "zabbix", "db_password": "zabbix", "timezone": "Europe/Berlin", @@ -30,6 +32,21 @@ "elastic_scheduling": {"SuspendTime": 3600, "ResumeTimeout": 900, "TreeWidth": 128}} LOG = logging.getLogger("bibigrid") + +def delete_old_vars(): + """ + Deletes host_vars and group_vars + @return: + """ + for folder in [aRP.GROUP_VARS_FOLDER, aRP.HOST_VARS_FOLDER]: + for file_name in os.listdir(folder): + # construct full file path + file = os.path.join(folder, file_name) + if os.path.isfile(file): + logging.debug('Deleting file: %s', file) + os.remove(file) + + def generate_site_file_yaml(custom_roles): """ Generates site_yaml (dict). @@ -37,48 +54,91 @@ def generate_site_file_yaml(custom_roles): :param custom_roles: ansibleRoles given by the config :return: site_yaml (dict) """ - site_yaml = [{'hosts': 'master', "become": "yes", - "vars_files": VARS_FILES, "roles": MASTER_ROLES}, - {"hosts": "workers", "become": "yes", "vars_files": VARS_FILES, - "roles": WORKER_ROLES}] # , - # {"hosts": "vpnwkr", "become": "yes", "vars_files": copy.deepcopy(VARS_FILES), - # "roles": ["common", "vpnwkr"]}] + site_yaml = [{'hosts': 'master', "become": "yes", "vars_files": VARS_FILES, "roles": MASTER_ROLES}, + {'hosts': 'vpngtw', "become": "yes", "vars_files": VARS_FILES, "roles": vpngtw_ROLES}, + {"hosts": "workers", "become": "yes", "vars_files": VARS_FILES, "roles": WORKER_ROLES}] # , + # {"hosts": "vpngtw", "become": "yes", "vars_files": copy.deepcopy(VARS_FILES), + # "roles": ["common", "vpngtw"]}] # add custom roles and vars for custom_role in custom_roles: VARS_FILES.append(custom_role["vars_file"]) - MASTER_ROLES.append(ADDITIONAL_PATH + custom_role["name"]) - WORKER_ROLES.append(ADDITIONAL_PATH + custom_role["name"]) + for role_group in [MASTER_ROLES, vpngtw_ROLES, WORKER_ROLES]: + role_group.append(ADDITIONAL_PATH + custom_role["name"]) return site_yaml -def generate_instances_yaml(cluster_dict, configuration, provider, cluster_id): # pylint: disable=too-many-locals +def write_host_and_group_vars(configurations, providers, cluster_id): # pylint: disable=too-many-locals """ ToDo filter what information really is necessary. Determined by further development Filters unnecessary information - :param cluster_dict: cluster_dict to get the information from - :param configuration: configuration of master cloud ToDo needs to be list in the future - :param provider: provider of master cloud ToDo needs to be list in the future + :param configurations: configurations + :param providers: providers :param cluster_id: To get proper naming :return: filtered information (dict) """ LOG.info("Generating instances file...") - workers = [] flavor_keys = ["name", "ram", "vcpus", "disk", "ephemeral"] - for index, worker in enumerate(configuration.get("workerInstances", [])): - flavor = provider.get_flavor(worker["type"]) - flavor_dict = {key: flavor[key] for key in flavor_keys} - image = worker["image"] - network = configuration["network"] - worker_range = "[0-{}]" - name = create.WORKER_IDENTIFIER(worker_group=index, cluster_id=cluster_id, - additional=worker_range.format(worker.get('count', 1) - 1)) - regexp = create.WORKER_IDENTIFIER(worker_group=index, cluster_id=cluster_id, - additional=r"\d+") - workers.append({"name": name, "regexp": regexp, "image": image, "network": network, "flavor": flavor_dict}) - master = {key: cluster_dict["master"][key] for key in - ["name", "private_v4", "public_v4", "public_v6", "cloud_specification"]} - master["flavor"] = {key: cluster_dict["master"]["flavor"][key] for key in flavor_keys} - return {"master": master, "workers": workers} + worker_count = 0 + vpn_count = 0 + for configuration, provider in zip(configurations, providers): + configuration_features = configuration.get("features", []) + if isinstance(configuration_features, str): + configuration_features = [configuration_features] + for index, worker in enumerate(configuration.get("workerInstances", [])): + flavor = provider.get_flavor(worker["type"]) + flavor_dict = {key: flavor[key] for key in flavor_keys} + name = create.WORKER_IDENTIFIER(worker_group=index, cluster_id=cluster_id, + additional=f"[{worker_count}-{worker_count + worker.get('count', 1) - 1}]") + group_name = name.replace("[", "").replace("]", "").replace(":", "_").replace("-", "_") + worker_count += worker.get('count', 1) + regexp = create.WORKER_IDENTIFIER(worker_group=index, cluster_id=cluster_id, additional=r"\d+") + worker_dict = {"name": name, "regexp": regexp, "image": worker["image"], + "network": configuration["network"], "flavor": flavor_dict, + "gateway_ip": configuration["private_v4"], + "cloud_identifier": configuration["cloud_identifier"]} + + worker_features = worker.get("features", []) + if isinstance(worker_features, str): + worker_features = [worker_features] + features = set(configuration_features+worker_features) + if features: + worker_dict["features"] = features + write_yaml(os.path.join(aRP.GROUP_VARS_FOLDER, group_name), worker_dict) + vpngtw = configuration.get("vpnInstance") + if vpngtw: + name = create.VPN_WORKER_IDENTIFIER(cluster_id=cluster_id, additional=f"{vpn_count}") + wireguard_ip = f"10.0.0.{vpn_count + 2}" # skipping 0 and 1 (master) + vpn_count += 1 + flavor = provider.get_flavor(vpngtw["type"]) + flavor_dict = {key: flavor[key] for key in flavor_keys} + regexp = create.WORKER_IDENTIFIER(cluster_id=cluster_id, additional=r"\d+") + vpngtw_dict = {"name": name, "regexp": regexp, "image": vpngtw["image"], + "network": configuration["network"], + "network_cidr": configuration["subnet_cidrs"], + "floating_ip": configuration["floating_ip"], + "private_v4": configuration["private_v4"], + "flavor": flavor_dict, + "wireguard_ip": wireguard_ip, + "cloud_identifier": configuration[ + "cloud_identifier"]} + if configuration.get("wireguard_peer"): + vpngtw_dict["wireguard"] = {"ip": wireguard_ip, + "peer": configuration.get( + "wireguard_peer")} + write_yaml(os.path.join(aRP.HOST_VARS_FOLDER, name), vpngtw_dict) + else: + master = configuration["masterInstance"] + name = create.MASTER_IDENTIFIER(cluster_id=cluster_id) + flavor = provider.get_flavor(master["type"]) + flavor_dict = {key: flavor[key] for key in flavor_keys} + master_dict = {"name": name, "image": master["image"], "network": configuration["network"], + "network_cidr": configuration["subnet_cidrs"], + "floating_ip": configuration["floating_ip"], "flavor": flavor_dict, + "private_v4": configuration["private_v4"], + "cloud_identifier": configuration["cloud_identifier"]} + if configuration.get("wireguard_peer"): + master_dict["wireguard"] = {"ip": "10.0.0.1", "peer": configuration.get("wireguard_peer")} + write_yaml(os.path.join(aRP.GROUP_VARS_FOLDER, "master.yml"), master_dict) def pass_through(dict_from, dict_to, key_from, key_to=None): @@ -96,110 +156,120 @@ def pass_through(dict_from, dict_to, key_from, key_to=None): dict_to[key_to] = dict_from[key_from] -def generate_common_configuration_yaml(cidrs, configuration, cluster_id, ssh_user, default_user): +def generate_common_configuration_yaml(cidrs, configurations, cluster_id, ssh_user, default_user): """ Generates common_configuration yaml (dict) :param cidrs: str subnet cidrs (provider generated) - :param configuration: master configuration (first in file) - :param cluster_id: Id of cluster + :param configurations: master configuration (first in file) + :param cluster_id: id of cluster :param ssh_user: user for ssh connections :param default_user: Given default user :return: common_configuration_yaml (dict) """ + master_configuration = configurations[0] LOG.info("Generating common configuration file...") # print(configuration.get("slurmConf", {})) - common_configuration_yaml = {"cluster_id": cluster_id, "cluster_cidrs": cidrs, - "default_user": default_user, - "local_fs": configuration.get("localFS", False), - "local_dns_lookup": configuration.get("localDNSlookup", False), - "use_master_as_compute": configuration.get("useMasterAsCompute", True), - "enable_slurm": configuration.get("slurm", False), - "enable_zabbix": configuration.get("zabbix", False), - "enable_nfs": configuration.get("nfs", False), - "enable_ide": configuration.get("ide", False), - "slurm": configuration.get("slurm", True), "ssh_user": ssh_user, - "slurm_conf": mergedeep.merge({}, SLURM_CONF, configuration.get("slurmConf", {}), - strategy=mergedeep.Strategy.TYPESAFE_REPLACE) - } - if configuration.get("nfs"): - nfs_shares = configuration.get("nfsShares", []) + common_configuration_yaml = {"cluster_id": cluster_id, "cluster_cidrs": cidrs, "default_user": default_user, + "local_fs": master_configuration.get("localFS", False), + "local_dns_lookup": master_configuration.get("localDNSlookup", False), + "use_master_as_compute": master_configuration.get("useMasterAsCompute", True), + "dns_server_list": master_configuration.get("dns_server_list", ["8.8.8.8"]), + "enable_slurm": master_configuration.get("slurm", False), + "enable_zabbix": master_configuration.get("zabbix", False), + "enable_nfs": master_configuration.get("nfs", False), + "enable_ide": master_configuration.get("ide", False), + "slurm": master_configuration.get("slurm", True), "ssh_user": ssh_user, + "slurm_conf": mergedeep.merge({}, SLURM_CONF, + master_configuration.get("slurmConf", {}), + strategy=mergedeep.Strategy.TYPESAFE_REPLACE)} + if master_configuration.get("nfs"): + nfs_shares = master_configuration.get("nfsShares", []) nfs_shares = nfs_shares + DEFAULT_NFS_SHARES - common_configuration_yaml["nfs_mounts"] = [{"src": "/" + nfs_share, "dst": "/" + nfs_share} - for nfs_share in nfs_shares] - common_configuration_yaml["ext_nfs_mounts"] = [{"src": ext_nfs_share, "dst": ext_nfs_share} for - ext_nfs_share in (configuration.get("extNfsShares", []))] + common_configuration_yaml["nfs_mounts"] = [{"src": "/" + nfs_share, "dst": "/" + nfs_share} for nfs_share in + nfs_shares] + common_configuration_yaml["ext_nfs_mounts"] = [{"src": ext_nfs_share, "dst": ext_nfs_share} for ext_nfs_share in + (master_configuration.get("extNfsShares", []))] - if configuration.get("ide"): - common_configuration_yaml["ide_conf"] = mergedeep.merge({}, IDE_CONF, configuration.get("ideConf", {}), + if master_configuration.get("ide"): + common_configuration_yaml["ide_conf"] = mergedeep.merge({}, IDE_CONF, master_configuration.get("ideConf", {}), strategy=mergedeep.Strategy.TYPESAFE_REPLACE) - if configuration.get("zabbix"): - common_configuration_yaml["zabbix_conf"] = mergedeep.merge({}, ZABBIX_CONF, configuration.get("zabbixConf", {}), + if master_configuration.get("zabbix"): + common_configuration_yaml["zabbix_conf"] = mergedeep.merge({}, ZABBIX_CONF, + master_configuration.get("zabbixConf", {}), strategy=mergedeep.Strategy.TYPESAFE_REPLACE) for from_key, to_key in [("waitForServices", "wait_for_services"), ("ansibleRoles", "ansible_roles"), ("ansibleGalaxyRoles", "ansible_galaxy_roles")]: - pass_through(configuration, common_configuration_yaml, from_key, to_key) + pass_through(master_configuration, common_configuration_yaml, from_key, to_key) + + if len(configurations) > 1: + peers = configuration_handler.get_list_by_key(configurations, "wireguard_peer") + common_configuration_yaml["wireguard_common"] = {"mask_bits": 24, "listen_port": 51820, "peers": peers} + return common_configuration_yaml -def generate_ansible_hosts_yaml(ssh_user, configuration, cluster_id): +def generate_ansible_hosts_yaml(ssh_user, configurations, cluster_id): """ Generates ansible_hosts_yaml (inventory file). :param ssh_user: str global SSH-username - :param configuration: dict + :param configurations: dict :param cluster_id: id of cluster :return: ansible_hosts yaml (dict) """ LOG.info("Generating ansible hosts file...") - ansible_hosts_yaml = {"master": {"hosts": {"localhost": to_instance_host_dict(ssh_user)}}, - "workers": {"hosts": {}, "children": {"ephemeral": {"hosts": {}}}} - } - # vpnwkr are handled like workers on this level + ansible_hosts_yaml = {"vpn": {"hosts": {}, + "children": {"master": {"hosts": {"localhost": to_instance_host_dict(ssh_user)}}, + "vpngtw": {"hosts": {}}}}, "workers": {"hosts": {}, "children": {}}} + # vpngtw are handled like workers on this level workers = ansible_hosts_yaml["workers"] - for index, worker in enumerate(configuration.get("workerInstances", [])): - name = create.WORKER_IDENTIFIER(worker_group=index, cluster_id=cluster_id, - additional=f"[0:{worker.get('count', 1) - 1}]") - worker_dict = to_instance_host_dict(ssh_user, ip="", local=False) - if "ephemeral" in worker["type"]: - workers["children"]["ephemeral"]["hosts"][name] = worker_dict - else: - workers["hosts"][name] = worker_dict + vpngtws = ansible_hosts_yaml["vpn"]["children"]["vpngtw"]["hosts"] + worker_count = 0 + vpngtw_count = 0 + for configuration in configurations: + for index, worker in enumerate(configuration.get("workerInstances", [])): + name = create.WORKER_IDENTIFIER(worker_group=index, cluster_id=cluster_id, + additional=f"[{worker_count}:{worker_count + worker.get('count', 1) - 1}]") + worker_dict = to_instance_host_dict(ssh_user, ip="") + group_name = name.replace("[", "").replace("]", "").replace(":", "_").replace("-", "_") + # if not workers["children"].get(group_name): # in the current setup this is not needed + workers["children"][group_name] = {"hosts": {}} + workers["children"][group_name]["hosts"][name] = worker_dict + worker_count += worker.get('count', 1) + + if configuration.get("vpnInstance"): + name = create.VPN_WORKER_IDENTIFIER(cluster_id=cluster_id, additional=vpngtw_count) + vpngtw_dict = to_instance_host_dict(ssh_user, ip="") + vpngtw_dict["ansible_host"] = configuration["floating_ip"] + vpngtws[name] = vpngtw_dict + vpngtw_count += 1 return ansible_hosts_yaml -def to_instance_host_dict(ssh_user, ip="localhost", local=True): # pylint: disable=invalid-name +def to_instance_host_dict(ssh_user, ip="localhost"): # pylint: disable=invalid-name """ Generates host entry :param ssh_user: str global SSH-username :param ip: str ip - :param local: bool :return: host entry (dict) """ - host_yaml = {"ansible_connection": "local" if local else "ssh", - "ansible_python_interpreter": PYTHON_INTERPRETER, + host_yaml = {"ansible_connection": "ssh", "ansible_python_interpreter": PYTHON_INTERPRETER, "ansible_user": ssh_user} if ip: host_yaml["ip"] = ip return host_yaml -def get_cidrs(configurations, providers): +def get_cidrs(configurations): """ Gets cidrs of all subnets in all providers :param configurations: list of configurations (dict) - :param providers: list of providers :return: """ all_cidrs = [] - for provider, configuration in zip(providers, configurations): - provider_cidrs = {"provider": type(provider).__name__, "provider_cidrs": []} - if isinstance(configuration["subnet"], list): - for subnet_id_or_name in configuration["subnet"]: - subnet = provider.get_subnet_by_id_or_name(subnet_id_or_name) - provider_cidrs["provider_cidrs"].append(subnet["cidr"]) # check key again - else: - subnet = provider.get_subnet_by_id_or_name(configuration["subnet"]) - provider_cidrs["provider_cidrs"].append(subnet["cidr"]) + for configuration in configurations: + subnet = configuration["subnet_cidrs"] + provider_cidrs = {"cloud_identifier": configuration["cloud_identifier"], "provider_cidrs": subnet} all_cidrs.append(provider_cidrs) return all_cidrs @@ -255,9 +325,8 @@ def generate_worker_specification_file_yaml(configurations): worker_specification_yaml = [] for worker_groups_provider_list, network in zip(worker_groups_list, network_list): for worker_group in worker_groups_provider_list: - worker_specification_yaml.append({"TYPE": worker_group["type"], - "IMAGE": worker_group["image"], - "NETWORK": network}) + worker_specification_yaml.append( + {"TYPE": worker_group["type"], "IMAGE": worker_group["image"], "NETWORK": network}) return worker_specification_yaml @@ -277,6 +346,20 @@ def write_yaml(path, generated_yaml, alias=False): yaml.dump(data=generated_yaml, stream=file, Dumper=yaml_dumper.NoAliasSafeDumper) +def add_wireguard_peers(configurations): + """ + Adds wireguard_peer information to configuration + @param configurations: + @return: + """ + if len(configurations) > 1: + for configuration in configurations: + private_key, public_key = wireguard_keys.generate() + configuration["wireguard_peer"] = {"name": configuration["cloud_identifier"], "private_key": private_key, + "public_key": public_key, "ip": configuration["floating_ip"], + "subnet": configuration["subnet_cidrs"]} + + def configure_ansible_yaml(providers, configurations, cluster_id): """ Generates and writes all ansible-configuration-yaml files. @@ -285,21 +368,19 @@ def configure_ansible_yaml(providers, configurations, cluster_id): :param cluster_id: id of cluster to create :return: """ + delete_old_vars() LOG.info("Writing ansible files...") alias = configurations[0].get("aliasDumper", False) - cluster_dict = list_clusters.dict_clusters(providers)[cluster_id] ansible_roles = get_ansible_roles(configurations[0].get("ansibleRoles")) default_user = providers[0].cloud_specification["auth"].get("username", configurations[0].get("sshUser", "Ubuntu")) + add_wireguard_peers(configurations) for path, generated_yaml in [ - (aRP.WORKER_SPECIFICATION_FILE, generate_worker_specification_file_yaml(configurations)), - (aRP.COMMONS_CONFIG_FILE, generate_common_configuration_yaml(cidrs=get_cidrs(configurations, providers), - configuration=configurations[0], - cluster_id=cluster_id, - ssh_user=configurations[0]["sshUser"], - default_user=default_user)), - (aRP.COMMONS_INSTANCES_FILE, generate_instances_yaml(cluster_dict, configurations[0], - providers[0], cluster_id)), - (aRP.HOSTS_CONFIG_FILE, generate_ansible_hosts_yaml(configurations[0]["sshUser"], configurations[0], - cluster_id)), + (aRP.WORKER_SPECIFICATION_FILE, generate_worker_specification_file_yaml(configurations)), ( + aRP.COMMONS_CONFIG_FILE, + generate_common_configuration_yaml(cidrs=get_cidrs(configurations), configurations=configurations, + cluster_id=cluster_id, ssh_user=configurations[0]["sshUser"], + default_user=default_user)), + (aRP.HOSTS_CONFIG_FILE, generate_ansible_hosts_yaml(configurations[0]["sshUser"], configurations, cluster_id)), (aRP.SITE_CONFIG_FILE, generate_site_file_yaml(ansible_roles))]: write_yaml(path, generated_yaml, alias) + write_host_and_group_vars(configurations, providers, cluster_id) # writing included in method diff --git a/bibigrid/core/utility/handler/configuration_handler.py b/bibigrid/core/utility/handler/configuration_handler.py index 51e555e3..4c8aa23c 100644 --- a/bibigrid/core/utility/handler/configuration_handler.py +++ b/bibigrid/core/utility/handler/configuration_handler.py @@ -18,10 +18,12 @@ LOG = logging.getLogger("bibigrid") -def read_configuration(path="bibigrid.yml"): + +def read_configuration(path="bibigrid.yml", configuration_list=True): """ - Reads yaml from file and returns the list of all configurations + Reads yaml from file and returns configuration :param path: Path to yaml file + :param configuration_list: True if list is expected :return: configurations (dict) """ configuration = None @@ -33,6 +35,9 @@ def read_configuration(path="bibigrid.yml"): LOG.warning("Couldn't read configuration %s: %s", path, exc) else: LOG.warning("No such configuration file %s.", path) + if configuration_list and not isinstance(configuration, list): + LOG.warning("Configuration should be list. Attempting to rescue by assuming a single configuration.") + return [configuration] return configuration @@ -63,8 +68,8 @@ def find_file_in_folders(file_name, folders): file_path = os.path.expanduser(os.path.join(folder_path, file_name)) if os.path.isfile(file_path): LOG.debug("File %s found in folder %s.", file_name, folder_path) - return read_configuration(file_path) - LOG.debug("File %s in folder %s not found.", file_name, folder_path) + return read_configuration(file_path, False) + LOG.debug("File %s not found in folder %s.", file_name, folder_path) return None @@ -82,8 +87,8 @@ def get_clouds_files(): if not clouds: LOG.warning("%s is not valid. Must contain key '%s:'", CLOUDS_YAML, CLOUD_ROOT_KEY) else: - LOG.warning("No %s at %s! Please copy your %s to one of those listed folders. Aborting...", - CLOUDS_YAML, CLOUDS_YAML_PATHS, CLOUDS_YAML) + LOG.warning("No %s at %s! Please copy your %s to one of those listed folders. Aborting...", CLOUDS_YAML, + CLOUDS_YAML_PATHS, CLOUDS_YAML) if clouds_public_yaml: clouds_public = clouds_public_yaml.get(CLOUD_PUBLIC_ROOT_KEY) if not clouds_public: @@ -109,18 +114,21 @@ def get_cloud_specification(cloud_name, clouds, clouds_public): cloud_public_specification = clouds_public.get(public_cloud_name) if not cloud_public_specification: LOG.warning("%s is not a valid profile name. " - "Must be contained under key '%s'", public_cloud_name, CLOUD_PUBLIC_ROOT_KEY) + "Must be contained under key '%s'", public_cloud_name, CLOUD_PUBLIC_ROOT_KEY) else: LOG.debug("Profile found. Merging begins...") try: mergedeep.merge(cloud_full_specification, cloud_public_specification, strategy=mergedeep.Strategy.TYPESAFE_REPLACE) except TypeError as exc: - LOG.warning("Existing %s and %s configuration keys don't match in type: %s", - CLOUDS_YAML, CLOUDS_PUBLIC_YAML, exc) + LOG.warning("Existing %s and %s configuration keys don't match in type: %s", CLOUDS_YAML, + CLOUDS_PUBLIC_YAML, exc) return {} else: LOG.debug("Using only clouds.yaml since no clouds-public profile is set.") + + if not cloud_full_specification.get("identifier"): + cloud_full_specification["identifier"] = cloud_name else: LOG.warning("%s is not a valid cloud name. Must be contained under key '%s'", cloud_name, CLOUD_ROOT_KEY) return cloud_full_specification @@ -133,6 +141,7 @@ def get_cloud_specifications(configurations): @return: list of dicts: cloud_specifications of every configuration """ clouds, clouds_public = get_clouds_files() + LOG.debug("Loaded clouds.yml and clouds_public.yml") cloud_specifications = [] if isinstance(clouds, dict): for configuration in configurations: diff --git a/bibigrid/core/utility/handler/ssh_handler.py b/bibigrid/core/utility/handler/ssh_handler.py index c8b44b8a..222bcfd0 100644 --- a/bibigrid/core/utility/handler/ssh_handler.py +++ b/bibigrid/core/utility/handler/ssh_handler.py @@ -1,54 +1,58 @@ """ -This module handles ssh and sftp connections to master and vpnwkrs. It also holds general execution routines used to +This module handles ssh and sftp connections to master and vpngtw. It also holds general execution routines used to setup the Cluster. """ import logging import os -import time import socket +import time + import paramiko import yaml -from bibigrid.models.exceptions import ConnectionException, ExecutionException from bibigrid.core.utility import ansible_commands as aC +from bibigrid.models.exceptions import ConnectionException, ExecutionException PRIVATE_KEY_FILE = ".ssh/id_ecdsa" # to name bibigrid-temp keys identically on remote ANSIBLE_SETUP = [aC.NO_UPDATE, aC.UPDATE, aC.PYTHON3_PIP, aC.ANSIBLE_PASSLIB, - (f"chmod 600 {PRIVATE_KEY_FILE}","Adjust private key permissions."), + (f"chmod 600 {PRIVATE_KEY_FILE}", "Adjust private key permissions."), aC.PLAYBOOK_HOME, aC.PLAYBOOK_HOME_RIGHTS, aC.ADD_PLAYBOOK_TO_LINUX_HOME] # ANSIBLE_START = [aC.WAIT_READY, aC.UPDATE, aC.MV_ANSIBLE_CONFIG, aC.EXECUTE] # another UPDATE seems to not necessary. ANSIBLE_START = [aC.WAIT_READY, aC.MV_ANSIBLE_CONFIG, aC.EXECUTE] -VPN_SETUP = ["echo Example"] +VPN_SETUP = [("echo Example", "Echos an Example")] LOG = logging.getLogger("bibigrid") -def get_ac_command(master_provider, name): +def get_ac_command(providers, name): """ Get command to write application credentials to remote ( - @param master_provider: provider that holds the master + @param providers: providers @param name: how the application credential shall be called @return: command to execute on remote to create application credential """ - master_cloud_specification = master_provider.cloud_specification - auth = master_cloud_specification["auth"] - ac_clouds_yaml = {"clouds": {"master": None}} - if auth.get("application_credential_id") and auth.get("application_credential_secret"): - wanted_keys = ["auth", "region_name", "interface", "identity_api_version", "auth_type"] - ac_cloud_specification = {k: master_cloud_specification[k] for k in wanted_keys if k in - master_cloud_specification} - else: - wanted_keys = ["region_name", "interface", "identity_api_version"] - ac = master_provider.create_application_credential(name=name) # pylint: disable=invalid-name - ac_dict = {"application_credential_id": ac["id"], "application_credential_secret": ac["secret"], - "auth_type": "v3applicationcredential", "auth_url": auth["auth_url"]} - ac_cloud_specification = {k: master_cloud_specification[k] for k in wanted_keys if k in - master_cloud_specification} - ac_cloud_specification.update(ac_dict) - ac_clouds_yaml["clouds"]["master"] = ac_cloud_specification + ac_clouds_yaml = {"clouds": {}} + for provider in providers: + cloud_specification = provider.cloud_specification + auth = cloud_specification["auth"] + if auth.get("application_credential_id") and auth.get("application_credential_secret"): + wanted_keys = ["auth", "region_name", "interface", "identity_api_version", "auth_type"] + ac_cloud_specification = {wanted_key: cloud_specification[wanted_key] for wanted_key in wanted_keys if + wanted_key in + cloud_specification} + else: + wanted_keys = ["region_name", "interface", "identity_api_version"] + ac = provider.create_application_credential(name=name) # pylint: disable=invalid-name + ac_dict = {"application_credential_id": ac["id"], "application_credential_secret": ac["secret"], + "auth_type": "v3applicationcredential", "auth_url": auth["auth_url"]} + ac_cloud_specification = {wanted_key: cloud_specification[wanted_key] for wanted_key in wanted_keys if + wanted_key in + cloud_specification} + ac_cloud_specification.update(ac_dict) + ac_clouds_yaml["clouds"][cloud_specification["identifier"]] = ac_cloud_specification return (f"echo '{yaml.safe_dump(ac_clouds_yaml)}' | sudo install -D /dev/stdin /etc/openstack/clouds.yaml", "Copy application credentials.") @@ -104,8 +108,9 @@ def is_active(client, floating_ip_address, private_key, username, timeout=5): establishing_connection = True while establishing_connection: try: - client.connect(hostname=floating_ip_address, username=username, pkey=private_key, timeout=5, auth_timeout=5) + client.connect(hostname=floating_ip_address, username=username, pkey=private_key, timeout=7, auth_timeout=5) establishing_connection = False + LOG.info(f"Successfully connected to {floating_ip_address}") except paramiko.ssh_exception.NoValidConnectionsError as exc: LOG.info(f"Attempting to connect to {floating_ip_address}... This might take a while", ) if attempts < timeout: @@ -123,7 +128,7 @@ def is_active(client, floating_ip_address, private_key, username, timeout=5): raise ConnectionException(exc) from exc except TimeoutError as exc: # pylint: disable=duplicate-except LOG.error("The attempt to connect to %s failed. Possible known reasons:" - "\n\t-Your network's security group doesn't allow SSH.", floating_ip_address) + "\n\t-Your network's security group doesn't allow SSH.", floating_ip_address) raise ConnectionException(exc) from exc @@ -220,10 +225,13 @@ def execute_ssh(floating_ip, private_key, username, commands=None, filepaths=Non LOG.error(f"Couldn't connect to floating ip {floating_ip} using private key {private_key}.") raise exc else: + LOG.debug(f"Setting up {floating_ip}") if filepaths: + LOG.debug(f"Setting up filepaths for {floating_ip}") sftp = client.open_sftp() for localpath, remotepath in filepaths: copy_to_server(sftp=sftp, localpath=localpath, remotepath=remotepath) LOG.debug("SFTP: Files %s copied.", filepaths) if commands: + LOG.debug(f"Setting up commands for {floating_ip}") execute_ssh_cml_commands(client, commands) diff --git a/bibigrid/core/utility/id_generation.py b/bibigrid/core/utility/id_generation.py index 8186d340..51ded9a9 100644 --- a/bibigrid/core/utility/id_generation.py +++ b/bibigrid/core/utility/id_generation.py @@ -44,9 +44,9 @@ def is_unique_cluster_id(cluster_id, providers): for provider in providers: for server in provider.list_servers(): master = create.MASTER_IDENTIFIER(cluster_id=cluster_id) - vpnwkr = create.VPN_WORKER_IDENTIFIER(cluster_id=cluster_id) + vpngtw = create.VPN_WORKER_IDENTIFIER(cluster_id=cluster_id) worker = create.WORKER_IDENTIFIER(cluster_id=cluster_id) - if server["name"] in [master, vpnwkr, worker]: + if server["name"] in [master, vpngtw, worker]: return False return True diff --git a/bibigrid/core/utility/paths/ansible_resources_path.py b/bibigrid/core/utility/paths/ansible_resources_path.py index 070d287d..de4a99d5 100644 --- a/bibigrid/core/utility/paths/ansible_resources_path.py +++ b/bibigrid/core/utility/paths/ansible_resources_path.py @@ -13,10 +13,12 @@ REQUIREMENTS_YML: str = "requirements.yml" UPLOAD_PATH: str = "/tmp/roles/" VARS_PATH: str = "vars/" +GROUP_VARS_PATH: str = "group_vars/" +HOST_VARS_PATH: str = "host_vars/" ROLES_PATH: str = "roles/" LOGIN_YML: str = VARS_PATH + "login.yml" -INSTANCES_YML: str = VARS_PATH + "instances.yml" CONFIG_YML: str = VARS_PATH + "common_configuration.yml" +HOSTS_YML: str = VARS_PATH + "hosts.yml" WORKER_SPECIFICATION_YML: str = VARS_PATH + "worker_specification.yml" ADDITIONAL_ROLES_PATH: str = ROLES_PATH + "additional/" DEFAULT_IP_FILE = VARS_PATH + "{{ ansible_default_ipv4.address }}.yml" @@ -26,16 +28,18 @@ # ANSIBLE_CFG_PATH = os.path.join(bP.RESOURCES_PATH, ANSIBLE_CFG) PLAYBOOK = "playbook/" PLAYBOOK_PATH: str = os.path.join(bP.RESOURCES_PATH, PLAYBOOK) -HOSTS_CONFIG_FILE: str = PLAYBOOK_PATH + ANSIBLE_HOSTS -CONFIG_ROOT_PATH: str = PLAYBOOK_PATH + VARS_PATH -ROLES_ROOT_PATH: str = PLAYBOOK_PATH + ROLES_PATH -COMMONS_LOGIN_FILE: str = PLAYBOOK_PATH + LOGIN_YML -COMMONS_INSTANCES_FILE: str = PLAYBOOK_PATH + INSTANCES_YML -COMMONS_CONFIG_FILE: str = PLAYBOOK_PATH + CONFIG_YML -SITE_CONFIG_FILE: str = PLAYBOOK_PATH + SITE_YML -WORKER_SPECIFICATION_FILE: str = PLAYBOOK_PATH + WORKER_SPECIFICATION_YML +HOSTS_FILE = os.path.join(PLAYBOOK_PATH, HOSTS_YML) +HOSTS_CONFIG_FILE: str = os.path.join(PLAYBOOK_PATH, ANSIBLE_HOSTS) +CONFIG_ROOT_PATH: str = os.path.join(PLAYBOOK_PATH, VARS_PATH) +ROLES_ROOT_PATH: str = os.path.join(PLAYBOOK_PATH, ROLES_PATH) +COMMONS_LOGIN_FILE: str = os.path.join(PLAYBOOK_PATH, LOGIN_YML) +COMMONS_CONFIG_FILE: str = os.path.join(PLAYBOOK_PATH, CONFIG_YML) +SITE_CONFIG_FILE: str = os.path.join(PLAYBOOK_PATH, SITE_YML) +WORKER_SPECIFICATION_FILE: str = os.path.join(PLAYBOOK_PATH, WORKER_SPECIFICATION_YML) ADDITIONAL_ROLES_ROOT_PATH: str = ROLES_ROOT_PATH + ADDITIONAL_ROLES_PATH VARS_FOLDER = os.path.join(PLAYBOOK_PATH, VARS_PATH) +GROUP_VARS_FOLDER = os.path.join(PLAYBOOK_PATH, GROUP_VARS_PATH) +HOST_VARS_FOLDER = os.path.join(PLAYBOOK_PATH, HOST_VARS_PATH) # REMOTE ROOT_PATH_REMOTE = "~" @@ -46,7 +50,6 @@ CONFIG_ROOT_PATH_REMOTE: str = PLAYBOOK_PATH_REMOTE + VARS_PATH ROLES_ROOT_PATH_REMOTE: str = PLAYBOOK_PATH_REMOTE + ROLES_PATH COMMONS_LOGIN_FILE_REMOTE: str = PLAYBOOK_PATH_REMOTE + LOGIN_YML -COMMONS_INSTANCES_FILE_REMOTE: str = PLAYBOOK_PATH_REMOTE + INSTANCES_YML COMMONS_CONFIG_FILE_REMOTE: str = PLAYBOOK_PATH_REMOTE + CONFIG_YML SITE_CONFIG_FILE_REMOTE: str = PLAYBOOK_PATH_REMOTE + SITE_YML WORKER_SPECIFICATION_FILE_REMOTE: str = PLAYBOOK_PATH_REMOTE + WORKER_SPECIFICATION_YML diff --git a/bibigrid/core/utility/validate_configuration.py b/bibigrid/core/utility/validate_configuration.py index f6916239..56037842 100644 --- a/bibigrid/core/utility/validate_configuration.py +++ b/bibigrid/core/utility/validate_configuration.py @@ -13,7 +13,7 @@ def evaluate(check_name, check_result): """ - Logs check_resul as warning if failed and as success if succeeded. + Logs check_result as warning if failed and as success if succeeded. :param check_name: :param check_result: :return: @@ -114,8 +114,7 @@ def check_clouds_yaml_security(): if clouds_public: for cloud in clouds_public: if clouds_public[cloud].get("profile"): - LOG.warning(f"{cloud}: Profiles should be placed in clouds.yaml not clouds-public.yaml! " - f"Key ignored.") + LOG.warning(f"{cloud}: Profiles should be placed in clouds.yaml not clouds-public.yaml!") success = False if clouds_public[cloud].get("auth"): for key in ["password", "username", "application_credential_id", "application_credential_secret"]: @@ -192,12 +191,12 @@ def validate(self): """ success = bool(self.providers) LOG.info("Validating config file...") - success = check_provider_data( - configuration_handler.get_list_by_key(self.configurations, "infrastructure"), - len(self.configurations)) and success - if not success: - LOG.warning("Providers not set correctly in configuration file. Check log for more detail.") - return success + # success = check_provider_data( + # configuration_handler.get_list_by_key(self.configurations, "cloud"), + # len(self.configurations)) and success + # if not success: + # LOG.warning("Providers not set correctly in configuration file. Check log for more detail.") + # return success checks = [("master/vpn", self.check_master_vpn_worker), ("servergroup", self.check_server_group), ("instances", self.check_instances), ("volumes", self.check_volumes), ("network", self.check_network), ("quotas", self.check_quotas), @@ -233,7 +232,7 @@ def check_provider_connections(self): providers_unconnectable = [] for provider in self.providers: if not provider.conn: - providers_unconnectable.append(provider.name) + providers_unconnectable.append(provider.cloud_specification["identifier"]) if providers_unconnectable: LOG.warning("API connection to %s not successful. Please check your configuration.", providers_unconnectable) @@ -451,8 +450,8 @@ def check_nfs(self): nfs_shares = master_configuration.get("nfsShares") nfs = master_configuration.get("nfs") if nfs_shares and not nfs: - success = True - LOG.warning("nfsShares exist, but nfs is False. nfsShares will be ignored!") + success = False + LOG.warning("nfsShares exist, but nfs is False.") else: success = True return success diff --git a/bibigrid/core/utility/wireguard/wireguard_keys.py b/bibigrid/core/utility/wireguard/wireguard_keys.py new file mode 100755 index 00000000..c25c7316 --- /dev/null +++ b/bibigrid/core/utility/wireguard/wireguard_keys.py @@ -0,0 +1,29 @@ +#!/usr/bin/env python3 +""" +Module for wireguard conforming base64 key creation +""" +import codecs + +from cryptography.hazmat.primitives import serialization +from cryptography.hazmat.primitives.asymmetric.x25519 import X25519PrivateKey + + +def generate(): + """ + Generates private and public key for wireguard + @return: tuple (privatekey_str, publickey_str) + """ + # generate private key + private_key = X25519PrivateKey.generate() + bytes_ = private_key.private_bytes( + encoding=serialization.Encoding.Raw, + format=serialization.PrivateFormat.Raw, + encryption_algorithm=serialization.NoEncryption() + ) + privatekey_str = codecs.encode(bytes_, 'base64').decode('utf8').strip() + + # derive public key + publickey = private_key.public_key().public_bytes(encoding=serialization.Encoding.Raw, + format=serialization.PublicFormat.Raw) + publickey_str = codecs.encode(publickey, 'base64').decode('utf8').strip() + return privatekey_str, publickey_str diff --git a/bibigrid/models/exceptions.py b/bibigrid/models/exceptions.py index 9691e472..aef5b38c 100644 --- a/bibigrid/models/exceptions.py +++ b/bibigrid/models/exceptions.py @@ -7,3 +7,11 @@ class ConnectionException(Exception): class ExecutionException(Exception): """ Execution exception. """ + + +class ConfigurationException(Exception): + """ Configuration exception""" + + +class ConflictException(Exception): + """ Conflict exception""" diff --git a/bibigrid/openstack/openstack_provider.py b/bibigrid/openstack/openstack_provider.py index c27f13c8..d37d57d4 100644 --- a/bibigrid/openstack/openstack_provider.py +++ b/bibigrid/openstack/openstack_provider.py @@ -1,8 +1,9 @@ """ -Concrete implementation of provider.py for openstack +Specific OpenStack implementation for the provider """ import logging +import re import keystoneclient import openstack @@ -14,10 +15,12 @@ from bibigrid.core import provider from bibigrid.core.actions import create from bibigrid.core.actions import version -from bibigrid.models.exceptions import ExecutionException +from bibigrid.models.exceptions import ExecutionException, ConflictException LOG = logging.getLogger("bibigrid") +PATTERN_IPV4 = r"^(?:(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)\.){3}(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)$" + class OpenstackProvider(provider.Provider): # pylint: disable=too-many-public-methods """ @@ -44,46 +47,31 @@ def create_session(self, app_name="openstack_scripts", app_version="1.0"): # print(v3) auth = self.cloud_specification["auth"] if all(key in auth for key in ["auth_url", "application_credential_id", "application_credential_secret"]): - auth_session = v3.ApplicationCredential( - auth_url=auth["auth_url"], + auth_session = v3.ApplicationCredential(auth_url=auth["auth_url"], application_credential_id=auth["application_credential_id"], - application_credential_secret=auth["application_credential_secret"] - ) + application_credential_secret=auth["application_credential_secret"]) elif all(key in auth for key in ["auth_url", "username", "password", "project_id", "user_domain_name"]): - auth_session = v3.Password(auth_url=auth["auth_url"], - username=auth["username"], - password=auth["password"], - project_id=auth["project_id"], - user_domain_name=auth["user_domain_name"]) + auth_session = v3.Password(auth_url=auth["auth_url"], username=auth["username"], password=auth["password"], + project_id=auth["project_id"], user_domain_name=auth["user_domain_name"]) else: raise KeyError("Not enough authentication information in clouds.yaml/clouds-public.yaml " "to create a session. Use one:\n" "Application Credentials: auth_url, application_credential_id and " "application_credential_secret\n" "Password: auth_url, username, password, project_id and user_domain_name") - return session.Session(auth=auth_session, - app_name=app_name, app_version=app_version) + return session.Session(auth=auth_session, app_name=app_name, app_version=app_version) def create_connection(self, app_name="openstack_bibigrid", app_version=version.__version__): auth = self.cloud_specification["auth"] - return openstack.connect( - load_yaml_config=False, - load_envvars=False, - auth_url=auth["auth_url"], - project_name=auth.get("project_name"), - username=auth.get("username"), - password=auth.get("password"), - region_name=self.cloud_specification["region_name"], - user_domain_name=auth.get("user_domain_name"), - project_domain_name=auth.get("user_domain_name"), - app_name=app_name, - app_version=app_version, + return openstack.connect(load_yaml_config=False, load_envvars=False, auth_url=auth["auth_url"], + project_name=auth.get("project_name"), username=auth.get("username"), password=auth.get("password"), + region_name=self.cloud_specification["region_name"], user_domain_name=auth.get("user_domain_name"), + project_domain_name=auth.get("user_domain_name"), app_name=app_name, app_version=app_version, application_credential_id=auth.get("application_credential_id"), application_credential_secret=auth.get("application_credential_secret"), interface=self.cloud_specification.get("interface"), identity_api_version=self.cloud_specification.get("identity_api_version"), - auth_type=self.cloud_specification.get("auth_type") - ) + auth_type=self.cloud_specification.get("auth_type")) def create_application_credential(self, name=None): return self.keystone_client.application_credentials.create(name=name).to_dict() @@ -124,17 +112,17 @@ def get_subnet_by_id_or_name(self, subnet_id_or_name): def list_servers(self): return [elem.toDict() for elem in self.conn.list_servers()] - def create_server(self, name, flavor, image, - network, key_name=None, wait=True, volumes=None): + def create_server(self, name, flavor, image, network, key_name=None, wait=True, volumes=None, security_groups=None): try: - server = self.conn.create_server(name=name, flavor=flavor, image=image, - network=network, key_name=key_name, volumes=volumes) + server = self.conn.create_server(name=name, flavor=flavor, image=image, network=network, key_name=key_name, + volumes=volumes, security_groups=security_groups) except openstack.exceptions.BadRequestException as exc: raise ConnectionError() from exc except openstack.exceptions.SDKException as exc: raise ExecutionException() from exc except AttributeError as exc: - raise ExecutionException("Unable to create server due to faulty configuration.") from exc + raise ExecutionException("Unable to create server due to faulty configuration.\n" + "Check your configuration using `-ch` instead of `-c`.") from exc if wait: self.conn.wait_for_server(server=server, auto_ip=False, timeout=600) server = self.conn.get_server(server["id"]) @@ -147,8 +135,7 @@ def delete_server(self, name_or_id, delete_ips=True): :param delete_ips: :return: """ - return self.conn.delete_server(name_or_id=name_or_id, wait=False, - timeout=180, delete_ips=delete_ips, + return self.conn.delete_server(name_or_id=name_or_id, wait=False, timeout=180, delete_ips=delete_ips, delete_ip_retry=1) def delete_keypair(self, key_name): @@ -161,7 +148,12 @@ def close(self): return self.conn.close() def create_keypair(self, name, public_key): - return self.conn.create_keypair(name=name, public_key=public_key) + # When running a multicloud approach on the same provider and same account, + # make sure that the keypair is only created ones. + try: + return self.conn.create_keypair(name=name, public_key=public_key) + except openstack.exceptions.ConflictException: + return self.conn.get_keypair(name) def get_network_id_by_subnet(self, subnet): subnet = self.conn.get_subnet(subnet) @@ -181,16 +173,14 @@ def get_free_resources(self): compute_limits = dict(self.conn.compute.get_limits()["absolute"]) # maybe needs limits.get(os.environ["OS_PROJECT_NAME"]) in the future volume_limits_generator = self.cinder.limits.get().absolute - volume_limits = {absolut_limit.name: absolut_limit.value for absolut_limit in - volume_limits_generator} + volume_limits = {absolut_limit.name: absolut_limit.value for absolut_limit in volume_limits_generator} # ToDo TotalVolumeGigabytes needs totalVolumeGigabytesUsed, but is not given volume_limits["totalVolumeGigabytesUsed"] = 0 free_resources = {} for key in ["total_cores", "floating_ips", "instances", "total_ram"]: free_resources[key] = compute_limits[key] - compute_limits[key + "_used"] for key in ["Volumes", "VolumeGigabytes", "Snapshots", "Backups", "BackupGigabytes"]: - free_resources[key] = volume_limits["maxTotal" + key] - volume_limits[ - "total" + key + "Used"] + free_resources[key] = volume_limits["maxTotal" + key] - volume_limits["total" + key + "Used"] return free_resources def get_volume_by_id_or_name(self, name_or_id): @@ -258,3 +248,71 @@ def get_flavors(self): @return: A generator able ot generate all flavors """ return self.conn.compute.flavors() + + def set_allowed_addresses(self, id_or_ip, allowed_address_pairs): + """ + Set allowed address (or CIDR) for the given network interface/port + :param id_or_ip: id or ip-address of the port/interfac + :param allowed_address: a list of allowed address pairs. For example: + [{ + "ip_address": "23.23.23.1", + "mac_address": "fa:16:3e:c4:cd:3f" + }] + :return updated port: + """ + # get port id if ip address is given + if re.match(PATTERN_IPV4, id_or_ip): + for port in self.conn.list_ports(): + for fixed_ip in port["fixed_ips"]: + if fixed_ip["ip_address"] == id_or_ip: + id_or_ip = port["id"] + break + + return self.conn.update_port(id_or_ip, allowed_address_pairs=allowed_address_pairs) + + def create_security_group(self, name, rules=None): + """ + Create a security and add given rules + :param name: Name of the security group to be created + :param rules: List of firewall rules in the following format. + rules = [{ "direction": "ingress" | "egress", + "ethertype": "IPv4" | "IPv6", + "protocol": "txp" | "udp" | "icmp" | None + "port_range_min": None | 1 - 65535 + "port_range_max": None | 1 - 65535 + "remote_ip_prefix": | None + "remote_group_id" | None }, + { ... } ] + + + :return: created security group + """ + security_group = self.conn.create_security_group(name, f"Security group for {name}.") + if rules is not None: + self.append_rules_to_security_group(security_group["id"], rules) + return security_group + + def delete_security_group(self, name_or_id): + """ + Delete a security group + :param name_or_id : Name or Id of the security group to be deleted + :return: True if delete succeeded, False otherwise. + """ + try: + return self.conn.delete_security_group(name_or_id) + except openstack.exceptions.ConflictException as exc: + raise ConflictException from exc + + def append_rules_to_security_group(self, name_or_id, rules): + """ + Append firewall rules to given security group + :param name_or_id: + :param rules: + :return: + """ + for rule in rules: + self.conn.create_security_group_rule(name_or_id, direction=rule["direction"], ethertype=rule["ethertype"], + protocol=rule["protocol"], port_range_min=rule["port_range_min"], + port_range_max=rule["port_range_max"], + remote_ip_prefix=rule["remote_ip_prefix"], + remote_group_id=rule["remote_group_id"]) diff --git a/documentation/markdown/bibigrid_feature_list.md b/documentation/markdown/bibigrid_feature_list.md index c5cae99e..5e4b9a82 100644 --- a/documentation/markdown/bibigrid_feature_list.md +++ b/documentation/markdown/bibigrid_feature_list.md @@ -12,5 +12,6 @@ | [Cloud Specification Data](features/cloud_specification_data.md) | Contains necessary data to establish a general connection to the provider. | | [Configuration](features/configuration.md) | Contains all data regarding cluster setup for all providers. | | [Command Line Interface](features/CLI.md) | What command line arguments can be passed into BiBiGrid. | +| [Multi Cloud](features/multi_cloud.md) | Explanation how BiBiGrid's multi-cloud approach works | ![](../images/actions.jpg) \ No newline at end of file diff --git a/documentation/markdown/bibigrid_software_list.md b/documentation/markdown/bibigrid_software_list.md index aca6a829..754c5368 100644 --- a/documentation/markdown/bibigrid_software_list.md +++ b/documentation/markdown/bibigrid_software_list.md @@ -7,3 +7,5 @@ | [Theia IDE](software/theia_ide.md) | Theia IDE is a Web IDE, build using the Theia Framework, that allows easy, intuitive and abstract **web access** to cluster nodes. Theia IDE is optional. | [Using "Theia" as an End User](https://theia-ide.org/docs/user_getting_started/) | | [Zabbix](software/zabbix.md) | Zabbix is an open source **monitoring** solution for networks, servers, clouds, applications and services. Zabbix is optional. | [What is Zabbix](https://www.zabbix.com/documentation/current/en/manual/introduction/about) | | [NFS](software/nfs.md) | Network File System allows file access over a network similarly to local storage access. | [Getting started with NFS](https://www.redhat.com/sysadmin/getting-started-nfs) | +| [Wireguard](software/wireguard.md) | Simple and fast VPN solution | [Quick Start](https://www.wireguard.com/quickstart/) | +| [Dnsmasq](software/dnsmasq.md) | A lightweight DHCP and caching DNS server. | [Dnsmasq Documentation](https://thekelleys.org.uk/dnsmasq/doc.html) | diff --git a/documentation/markdown/features/CLI.md b/documentation/markdown/features/CLI.md index baca937d..e21089d7 100644 --- a/documentation/markdown/features/CLI.md +++ b/documentation/markdown/features/CLI.md @@ -1,17 +1,24 @@ # CLI + Available command line parameters: + - `-h, --help` show help message and exit -- `-v, --verbose` Increases output verbosity (can be of great use when cluster fails to start). `-v` adds more detailed info to the logfile, `-vv` adds debug information to the logfile. +- `-v, --verbose` Increases output verbosity (can be of great use when cluster fails to start). `-v` adds more + detailed info to the logfile, `-vv` adds debug information to the logfile. - `-d, --debug` Keeps cluster active in case of an error. Offers termination after successful create. -- `-i , --config_input (required)` Path to YAML configurations file. Relative paths can be used and start at `~/.config/bibigrid` -- `-cid , --cluster_id ` Cluster id is needed for ide and termination. If no cluster id is set, the last started cluster's id will be used (except for `list_clusters`). +- `-i , --config_input (required)` Path to YAML configurations file. Relative paths can be used and start + at `~/.config/bibigrid` +- `-cid , --cluster_id ` Cluster id is needed for ide and termination. If no cluster id is set, + the last started cluster's id will be used (except for `list_clusters`). + ## Mutually exclusive actions: choose exactly one + - `-V, --version` Displays version. - `-t, --terminate_cluster` Terminates cluster. Needs cluster-id set. - `-c, --create` Creates cluster. - `-l, --list_clusters` Lists all running clusters. If cluster-id is - set, will list this cluster in detail only. + set, will list this cluster in detail only. - `-ch, --check` Validates cluster configuration. - `-ide, --ide` Establishes a secured connection to ide. - Needs cluster-id set. + Needs cluster-id set. - `-u, --update` Updates master's playbook. Needs cluster-id set, no job running and no workers powered up. \ No newline at end of file diff --git a/documentation/markdown/features/bibigrid_ansible_playbook.md b/documentation/markdown/features/bibigrid_ansible_playbook.md new file mode 100644 index 00000000..d3176800 --- /dev/null +++ b/documentation/markdown/features/bibigrid_ansible_playbook.md @@ -0,0 +1,3 @@ +# BiBiGrid's Ansible Playbook + +TODO \ No newline at end of file diff --git a/documentation/markdown/features/check.md b/documentation/markdown/features/check.md index c92c8a81..4ff4d5db 100644 --- a/documentation/markdown/features/check.md +++ b/documentation/markdown/features/check.md @@ -1 +1,87 @@ -# Check \ No newline at end of file +# Check + +## Exactly one master or vpn instance per configuration + +There can only be a single master or a single vpn-gateway per configuration. + +## Given Server group exist + +If a server group is defined in the [Configuration](configuration.md), it must exist in the cloud. + +## All instances are defined correctly + +### All instances' images and flavors are compatible + +Images have a minimal requirement for ram and disk space that a flavor needs to fulfil. If the given flavor fulfils the +image's requirement, the check is successful. + +## All MasterMounts exist as snapshots or volumes + +If any `MasterMounts` are defined in the [Configuration](configuration.md#mastermounts--optional-), they must exist in +the cloud as volumes or +snapshots (if they exist as snapshots, volumes will be created of them during creation). + +## Network or Subnet is given and exists + +A network or subnet must be defined in the [Configuration](configuration.md#subnet--required-), and it must exist in the +cloud as well. + +## Quotas are not exceeded + +Total cores, floating-ips (not working as OpenStack doesn't return the correct value), instances number, total ram, +volumes, volume gigabytes and snapshots are compared to the expected usage of the cluster. If the required resources +fit, the check is successful. Total cores and ram is used as OpenStack doesn't provide a feasible extraction +option for current usage. + +## All public key files exist + +If any additional public key files are defined in the [Configuration](configuration.md#sshpublickeyfiles--optional-), +the public key file must actually exist on the local machine. + +### All public key files are secure (not failing) + +BiBiGrid will also check whether your key is considered secure. Currently `RSA: >=4096`, `ECDSA: >=521` +and `ED25519: >=256` +are whitelisted. This check will only print a warning, but will not fail the check. + +## All clouds yaml entries are secure and valid + +If this check doesn't succeed, downloading the `clouds.yaml` again might be the fastest solution. +You can read more about these files here [Cloud-specification](cloud_specification_data.md) + +### Valid + +A `clouds.yaml` entry is considered valid if - combined with any `clouds-public.yaml` entries it refers to using the +profile +key - the following keys exist. Additional keys may be set, but are not required for the check to be successful. + +#### Password + +```yaml + openstack: + auth: + username: + password: + auth_url: + region_name: +``` + +#### Application Credential + +```yaml + openstack_giessen: + auth: + auth_url: + application_credential_id: + application_credential_secret: + region_name: + auth_type: "v3applicationcredential" +``` + +### Secure + +A cloud-specification setup is considered secure if the `clouds-public.yaml` doesn't +contain `password`, `username`, `application_credential_id`, +`profile` or `application_credential_secret`. + +## If NFS shares are given, nfs must be set to True \ No newline at end of file diff --git a/documentation/markdown/features/cloud_specification_data.md b/documentation/markdown/features/cloud_specification_data.md index 5230b6a8..df4457ef 100644 --- a/documentation/markdown/features/cloud_specification_data.md +++ b/documentation/markdown/features/cloud_specification_data.md @@ -1,18 +1,25 @@ # Cloud Specification Data -To access the cloud, authentication information is required. The BiBiGrid no longer uses environment variables, but a two file system instead. -`clouds.yaml` and `clouds-public.yaml` can be placed in `~/.config/bibigrid/` or `/etc/bibigrid/` and will be loaded by BiBiGrid on execution. -While you store your password and username in `clouds.yaml` (private), you can store all other information ready to share in `clouds-public.yaml` (shareable). -However, all information can just be stored in `clouds.yaml`. -Keys set in `clouds.yaml` will overwrite keys from `clouds-public.yaml`. +To access clouds, authentication information is required. BiBiGrid no longer uses environment variables, but an already +established two file system instead. +Those two files, `clouds.yaml` and `clouds-public.yaml`, can be placed in `~/.config/bibigrid/` or `/etc/bibigrid/`. -## Openstack -Be aware that the downloaded `clouds.yaml` file contains all information. -OpenStack does not split information into `clouds.yaml` and `clouds-public.yaml` on its own. -The example files show an example split. +While you should store your password and username in `clouds.yaml` (private), you can store all other information ready +to share in `clouds-public.yaml` (shareable). However, splitting information is not necessary. +You can work with just a `clouds.yaml` containing all the data, too. OpenStack will not split information into both +files, but just provides a single large `clouds.yaml`. -### Password Example -Using the password `clouds.yaml` is easy. However, since passwords - unlike [Application Credentials](#application-credentials-example) +Be aware that keys set in `clouds.yaml` will overwrite keys from `clouds-public.yaml`. + +In order to authenticate you either need to provide your [username and password](#password-example), +or provide an [application credentials](#application-credentials-example) (safer). + +## Providers +### Openstack +#### Password Example + +Using the password `clouds.yaml` is easy. However, since passwords - +unlike [Application Credentials](#application-credentials-example) don't have an expiration date, caution is advised. ![Download](../../images/features/cloud_specification_data/pw_screen1.png) @@ -20,6 +27,7 @@ don't have an expiration date, caution is advised. Move the downloaded file to `~/.config/bibigrid/` or `/etc/bibigrid/`. ##### Password clouds.yaml + ```yaml clouds: openstack: @@ -30,6 +38,7 @@ clouds: ``` ##### Password clouds-public.yaml + ```yaml public-clouds: nameOfCloudsPublicYamlEntry: @@ -42,9 +51,11 @@ public-clouds: interface: "public" identity_api_version: 3 ``` -### Application Credentials Example -The following show, how an Application Credential can be created and the related `clouds.yaml` downloaded. -Application Credentials are the preferred way of authentication since they do have an expiration date and + +#### Application Credentials Example + +The following shows, how to create an application credential and how to download the related `clouds.yaml`. +Application Credentials are the preferred way of authentication since they do have an expiration date and their access can be limited. ![Navigation](../../images/features/cloud_specification_data/ac_screen1.png) @@ -53,7 +64,8 @@ their access can be limited. Move the downloaded file to `~/.config/bibigrid/` or `/etc/bibigrid/`. -#### Application Credential clouds.yaml +##### Application Credential clouds.yaml + ```yaml clouds: openstack: @@ -63,7 +75,8 @@ clouds: application_credential_secret: SecureSecret ``` -#### Application Credential clouds-public.yaml +##### Application Credential clouds-public.yaml + ```yaml public-clouds: nameOfCloudsPublicYamlEntry: @@ -73,4 +86,49 @@ public-clouds: interface: "public" identity_api_version: 3 auth_type: "v3applicationcredential" +``` + +## The identifier Key +In the examples below you will see the `identifier` key. That key doesn't exist in regular `clouds.yaml` files. +This key allows you to use a non-unique identifier that will be shown in debugging and as your slurm partition name. +You do not need to set it. In that case the key under `clouds` is taken (`openstack` in the examples above). + +## Multiple Clouds +If you are using BiBiGrid's Multi Cloud setup, you just need to save the information +of both clouds into a single `clouds.yaml` and optionally `clouds-public.yaml`. +Make sure that they are named differently. + +### Multiple Clouds Example + +```yaml +clouds: + openstack_location1: + profile: nameOfCloudsPublicYamlEntry + auth: + username: SamSampleman + password: SecurePassword + openstack_location2: + auth: + auth_url: https://somelink:someport + application_credential_id: SomeID + application_credential_secret: SecureSecret + region_name: SomeRegion + identifier: location2 + interface: "public" + identity_api_version: 3 + auth_type: "v3applicationcredential" +``` + +```yaml +public-clouds: + nameOfCloudsPublicYamlEntry: + auth: + auth_url: https://somelink:someport + project_id: someProjectId + project_name: someProjectName + user_domain_name: someDomainName + region_name: someRegionName + interface: "public" + identifier: location1 + identity_api_version: 3 ``` \ No newline at end of file diff --git a/documentation/markdown/features/configuration.md b/documentation/markdown/features/configuration.md index 99fccbee..9237f4c4 100644 --- a/documentation/markdown/features/configuration.md +++ b/documentation/markdown/features/configuration.md @@ -1,40 +1,53 @@ # Configuration +> **Note** +> +> First take a look at our [Hands-On BiBiGrid Tutorial](https://github.com/deNBI/bibigrid_clum2022) and our +> example [bibigrid.yml](../../../bibigrid.yml). +> This documentation is no replacement for those, but provides greater detail - too much detail for first time users. + The configuration file (often called `bibigrid.yml`) contains important information about cluster creation. -The cluster configuration holds a list of configurations where each configuration is assigned to a specific provider -(location). That allows cluster to stretch over multiple providers. The configuration file is best stored in -`~/.config/bibigrid/` since BiBiGrid starts its relative search there. +The cluster configuration holds a list of configurations where each configuration has a specific +cloud (location) and infrastructure (e.g. OpenStack). For single-cloud use cases you just need a single configuration. +However, you can use additional configurations to set up a multi-cloud. + +The configuration file is best stored in `~/.config/bibigrid/`. BiBiGrid starts its relative search there. ## Configuration List -The first configuration is always the master's provider configuration. -Only the first configuration is allowed to have a master key. -Every following configuration describes a provider that is not the master's provider containing a number of worker and a -vpnwkr (vpn worker). The vpnwkr is a worker with a floating IP. That allows the master - that knows all vpnwkrs to access -all workers using the floating IP as an entry point into the other local networks. However, all that will be covered by -an abstraction layer using a virtual network. Therefore, end users can work on a spread cluster without noticing it. - -### Master Provider Configuration -As mentioned before, the first configuration has a master key. Apart from that it also holds all information that is - -simply put - true over the entire cluster. We also call those keys global. -Keys that belong only to a single provider configuration are called local. -For example whether the master works alongside the workers is a general fact. -Therefore, it is stored within the first configuration. The master provider configuration. +If you have a single-cloud use case, you can [skip ahead](). + +Only the first configuration holds a `master` key (also called `master configuration`). +Every following configuration must hold a `vpngtw` key. + +Later, this vpngtw allows BiBiGrid to connect multiple clouds. + +[Here](multi_cloud.md) you can get a technical overview regarding BiBiGrid's multi-cloud setup. + +### General Cluster Information +Apart from the master key the master configuration (first configuration) also holds all information that is - +simply put - true over the entire cluster. We also call those keys `global`. Keys that belong only to a single cloud configuration are called `local`. + +For example whether the master works alongside the workers is a general fact (global). +Therefore, it is stored within the master configuration. ## Keys ### Global #### sshPublicKeyFiles (optional) -`sshPublicKeyFiles` expects a list of public keyfiles to be registered on every node. That allows you to grant access to -created clusters to the owners of the private keyfile. For example, you can add colleges public key to the list and allow -him to access your started cluster later on to debug it. + +`sshPublicKeyFiles` expects a list of public keyfiles to be registered on every instance. After cluster creation, you +or others can use the corresponding private key to log into the instances. + +```yaml +sshPublicKeyFiles: + - /home/user/.ssh/id_ecdsa_colleague.pub +``` #### masterMounts (optional) -`masterMounts` expects a list of volumes or snapshots that will then be attached to the master. If any snapshots are -given, the related volumes are first created and then those volumes are used by BiBiGrid. Those volumes are not deleted -after Cluster termination. -[[Link to mounting infomation]] #ToDo +`masterMounts` expects a list of volumes and snapshots. Those will be attached to the master. If any snapshots are +given, volumes are first created from them. Volumes are not deleted after Cluster termination.
@@ -42,14 +55,16 @@ after Cluster termination. [Mounting](https://man7.org/linux/man-pages/man8/mount.8.html) adds a new filesystem to the file tree allowing access. -
+#### nfsShares (optional) +`nfsShares` expects a list of folder paths to share over the network using nfs. +In every case, `/vol/spool/` is always an nfsShare. -#### nfsShares (optional) -`nfsShares` expects a list of folder paths to share using nfs. In every case, `/vol/spool/` is always an nfsShare. -This key is only relevant if the [nfs key](#nfs--optional-) is set `True`. +This key is only relevant if the [nfs key](#nfs-optional) is set `True`. + +If you would like to share a [masterMount](#mastermounts-optional), take a look [here](../software/nfs.md#mount-volume-into-share).
@@ -60,17 +75,22 @@ NFS (Network File System) is a stable and well-functioning network protocol for
#### ansibleRoles (optional) -Yet to be explained. -``` + +Yet to be explained and implemented. + +```yaml - file: SomeFile hosts: SomeHosts name: SomeName vars: SomeVars vars_file: SomeVarsFile ``` + #### ansibleGalaxyRoles (optional) -Yet to be explained. -``` + +Yet to be explained and implemented. + +```yaml - hosts: SomeHost name: SomeName galaxy: SomeGalaxy @@ -81,124 +101,182 @@ Yet to be explained. ``` #### localFS (optional) -This key helps some users to create a filesystem to their liking. It is not used in general. + +In general, this key is ignored. +It expects `True` or `False` and helps some specific users to create a filesystem to their liking. Default is `False`. #### localDNSlookup (optional) -If `True`, master will store the link to his workers. This is called -[Local DNS Lookup](https://helpdeskgeek.com/networking/edit-hosts-file/). + +If `True`, master will store DNS information for his workers. Default is `False`. +[More information](https://helpdeskgeek.com/networking/edit-hosts-file/). + +#### slurm +If `False`, the cluster will start without the job scheduling system slurm. +This is relevant to the fewest. Default is `True`. #### zabbix (optional) -If `True`, the monitoring solution [zabbix](https://www.zabbix.com/) will be installed on the master. + +If `True`, the monitoring solution [zabbix](https://www.zabbix.com/) will be installed on the master. Default is `False`. #### nfs (optional) -If `True`, nfs is created. -
- -What is NFS? - +If `True`, [nfs](../software/nfs.md) is set up. Default is `False`. -NFS (Network File System) is a stable and well-functioning network protocol for exchanging files over the local network. -
+#### ide (optional) + +If `True`, [Theia Web IDE](../software/theia_ide.md) is installed. +After creation connection information is [printed](../features/create.md#prints-cluster-information). #### useMasterAsCompute (optional) -Default the master always works together with the workers on submitted jobs. If you set `useMasterWithPublicIp` - to `False` the master will instead no longer support the workers. + +If `False`, master will no longer help workers to process jobs. Default is `True`. #### waitForServices (optional): -Expects a list of services to wait for. This is required if your provider has any post-launch services. If not set, -seemingly random errors can occur when the service interrupts the ansible execution. Providers and their services are -listed on [de.NBI Wiki](https://cloud.denbi.de/wiki/) at `Computer Center Specific`. + +Expects a list of services to wait for. +This is required if your provider has any post-launch services interfering with the package manager. If not set, +seemingly random errors can occur when the service interrupts ansible's execution. Services are +listed on [de.NBI Wiki](https://cloud.denbi.de/wiki/) at `Computer Center Specific` (not yet). ### Local #### infrastructure (required) + `infrastructure` sets the used provider implementation for this configuration. Currently only `openstack` is available. -Other infrastructures would be AWS and so on. +Other infrastructures would be [AWS](https://aws.amazon.com/) and so on. #### cloud -`cloud` decides which entry in the `clouds.yaml` is used. -When using OpenStack the downloaded `clouds.yaml` is named `openstack` -`cloud: openstack` +`cloud` decides which entry in the `clouds.yaml` is used. When using OpenStack the entry is named `openstack`. +You can read more about the `clouds.yaml` [here](cloud_specification_data.md). -#### workerInstances (optional) -`workerInstances` expects a list of workers to be used on this specific provider the configuration is for. -`Instances` are also called `servers`. +#### workerInstances (optional) -``` +`workerInstances` expects a list of worker groups (instance definitions with `count` key). +If `count` is omitted, `count: 1` is assumed. + +```yaml workerInstance: - type: de.NBI tiny image: Ubuntu 22.04 LTS (2022-10-14) count: 2 ``` -- `type` sets the instance's hardware configuration. Also called `flavor` sometimes. + +- `type` sets the instance's hardware configuration. - `image` sets the bootable operating system to be installed on the instance. -- `count` sets how many workers of that `type` `image` combination are to be used by the cluster +- `count` sets how many workers of that `type` `image` combination are in this work group -Find your active `images`: +##### Find your active `images` -``` +```commandline openstack image list --os-cloud=openstack | grep active ``` -Currently, images based on Ubuntu 20.04/22.04 (Focal/Jammy) and Debian 11(Bullseye) are supported. +Currently, images based on Ubuntu 20.04/22.04 (Focal/Jammy) and Debian 11(Bullseye) are supported. -Find your active `flavors`: +##### Find your active `type`s +`flavor` is just the OpenStack terminology for `type`. -``` +```commandline openstack flavor list --os-cloud=openstack ``` -#### Master or vpnWorker? +##### features (optional) +You can declare a list of features for a worker group. Those are then attached to each node in the worker group. +For example: +```yaml +workerInstance: + - type: de.NBI tiny + image: Ubuntu 22.04 LTS (2022-10-14) + count: 2 + features: + - hasdatabase + - holdsinformation +``` + +###### What's a feature? +Features allow you to force Slurm to schedule a job only on nodes that meet a certain `bool` constraint. +This can be helpful when only certain nodes can access a specific resource - like a database. + +If you would like to know more about how features exactly work, +take a look at [slurm's documentation](https://slurm.schedmd.com/slurm.conf.html#OPT_Features). + +#### Master or vpngtw? + +##### masterInstance -##### Master Only in the first configuration and only one: -``` + +```yaml masterInstance: type: de.NBI tiny image: Ubuntu 22.04 LTS (2022-10-14) ``` -##### vpnWorker: -Exactly once in every configuration but the first: +You can create features for the master [in the same way](#features-optional) as for the workers: + +```yaml + masterInstance: + type: de.NBI tiny + image: Ubuntu 22.04 LTS (2022-10-14) + features: + - hasdatabase + - holdsinformation ``` - vpnWorker: + +##### vpngtw: + +Exactly one in every configuration but the first: + +```yaml + vpngtw: type: de.NBI tiny image: Ubuntu 22.04 LTS (2022-10-14) ``` #### sshUser (required) + `sshUser` is the standard user of the installed images. For `Ubuntu 22.04` this would be `ubuntu`. #### region (required) + Every [region](https://docs.openstack.org/python-openstackclient/rocky/cli/command-objects/region.html) has its own openstack deployment. Every [avilability zone](#availabilityzone-required) belongs to a region. Find your `regions`: -``` + +```commandline openstack region list --os-cloud=openstack ``` - #### availabilityZone (required) + [availability zones](https://docs.openstack.org/nova/latest/admin/availability-zones.html) allow to logically group nodes. Find your `availabilityZones`: -``` + +```commandline openstack region list --os-cloud=openstack ``` #### subnet (required) + `subnet` is a block of ip addresses. Find available `subnets`: -``` +```commandline openstack subnet list --os-cloud=openstack ``` #### localDNSLookup (optional) + If no full DNS service for started instances is available, set `localDNSLookup: True`. -Currently the case in Berlin, DKFZ, Heidelberg and Tuebingen. \ No newline at end of file +Currently the case in Berlin, DKFZ, Heidelberg and Tuebingen. + +#### features (optional) + +You can declare a list of [features](#whats-a-feature) that are then attached to every node in the configuration. +If both [worker group](#features-optional) or [master features](#masterInstance) and configuration features are defined, +they are merged. \ No newline at end of file diff --git a/documentation/markdown/features/create.md b/documentation/markdown/features/create.md index 6efe52f0..7eba57a4 100644 --- a/documentation/markdown/features/create.md +++ b/documentation/markdown/features/create.md @@ -1,2 +1,87 @@ # Create -Temporary cluster keys will be stored in `~/.config/bibigrid/keys`. \ No newline at end of file + +Creates the cluster and prints information regarding further actions. +Temporary cluster keys will be stored in `~/.config/bibigrid/keys`. + +## Generates a keypair +Using `ssh-keygen -t ecdsa` a keypair is generated. +This keypair is injected into every started instance and is used by BiBiGrid to connect to instances. + + +## Configure Network +### Generates security groups +- When `Remote Security Group ID` is set, the rule only applies to nodes within that group id. +The rule cannot apply to nodes outside the cloud. +#### Default Security Group +- allows SSH from everywhere +- allows everything within the same security group +- +| Direction | Ethertype | Protocol | Port Range Min | Port Range Max | Remote IP Prefix | Remote Security Group ID | +|:---------:|:---------:|:--------:|:--------------:|:--------------:|:----------------:|:------------------------:| +| Ingress | IPv4 | None | None | None | None | Default Security Group | +| Ingress | IPv4 | TCP | 22 | 22 | 0.0.0.0/0 | None | + + +##### Default Security Group - Extra Rules: Multi-Cloud +When running a multi-cloud additionally the following rules are set: +- allows every TCP connection from the VPN (10.0.0.0/24) +- allows every TCP connection from other cidrs (other clouds) + +| Direction | Ethertype | Protocol | Port Range Min | Port Range Max | Remote IP Prefix | Remote Security Group ID | +|:---------:|:---------:|:--------:|:--------------:|:--------------:|:----------------:|:------------------------:| +| Ingress | IPv4 | TCP | None | None | 10.0.0.0/24 | None | +| Ingress | IPv4 | TCP | None | None | other_cidrs | None | + +#### Wireguard Security Group +Only created when multi-cloud is used (more than one configuration in [configuration](configuration.md) file). +- allow every UDP connection from the other clouds over 51820 (necessary for [WireguardVPN](../software/wireguard.md)). + +| Direction | Ethertype | Protocol | Port Range Min | Port Range Max | Remote IP Prefix | Remote Security Group ID | +|:---------:|:---------:|:--------:|:--------------:|:--------------:|:----------------:|:------------------------:| +| Ingress | IPv4 | UDP | 51820 | 51820 | other_cidrs | None | + +### Allowed Addresses +- For every cloud C, all other clouds' cidr is set as an `allowed_address` with the mac address of C. +This prevents outgoing addresses with the "wrong" mac address, ip combination from getting stopped by port security. + +## Starts master and vpngtws + +For the first configuration a master, for all others a vpngtw is started. + +## Uploads Data + +The [playbook](../../../resources/playbook) and [bin](../../../resources/bin) is uploaded. + +## Executes Ansible + +### Preparation +- Automatic updates are deactivated on host machine +- Python is installed +- Move playbook contents to new home `/opt/playbook/` and set rights accordingly +- Wait until dpkg lock is released +- Install `ansible.cfg` to `/etc/ansible/ansible.cfg` + +### Execution + +The playbook is executed. Read more about the exact steps of execution [here](bibigrid_ansible_playbook.md). + +## Prints Cluster Information + +At the end the cluster information is printed: +- cluster id +- master's public ip +- How to connect via SSH +- How to terminate the cluster +- How to print detailed cluster info +- How to connect via IDE Port Forwarding (only if [ide](configuration.md#ide-optional)) +- Duration + +### Print Example +``` +Cluster myclusterid with master 123.45.67.89 up and running! +SSH: ssh -i '/home/user/.config/bibigrid/keys/tempKey_bibi-myclusterid' ubuntu@123.45.67.89 +Terminate cluster: ./bibigrid.sh -i '/home/xaver/.config/bibigrid/hybrid.yml' -t -cid myclusterid +Detailed cluster info: ./bibigrid.sh -i '/home/xaver/.config/bibigrid/hybrid.yml' -l -cid myclusterid +IDE Port Forwarding: ./bibigrid.sh -i '/home/xaver/.config/bibigrid/hybrid.yml' -ide -cid myclusterid +--- 12 minutes and 0.9236352443695068 seconds --- +``` diff --git a/documentation/markdown/features/ide.md b/documentation/markdown/features/ide.md index 6093e746..bb31faab 100644 --- a/documentation/markdown/features/ide.md +++ b/documentation/markdown/features/ide.md @@ -1,2 +1,7 @@ # Web IDE +Expects `-cid` set and starts [Theia Web IDE](../software/theia_ide.md). + +## Port Forwarding +Tries to forward `localhost:9191` to `remote:8181` (Theia listens at `8181`). +In case this port is already in use, a number between 1 and 100 is added to the last attempted port and a new attempt is made. \ No newline at end of file diff --git a/documentation/markdown/features/list_clusters.md b/documentation/markdown/features/list_clusters.md index 0f832117..6318e2bf 100644 --- a/documentation/markdown/features/list_clusters.md +++ b/documentation/markdown/features/list_clusters.md @@ -1 +1,5 @@ -# List Clusters \ No newline at end of file +# List Clusters +When `-cid` is defined, lists that specific cluster in detail. Otherwise lists short cluster information of all clusters. + +## Too much information +Currently, a lot of information is listed unfiltered. Reducing the list to all necessary information is wip. \ No newline at end of file diff --git a/documentation/markdown/features/multi_cloud.md b/documentation/markdown/features/multi_cloud.md new file mode 100644 index 00000000..2d573e54 --- /dev/null +++ b/documentation/markdown/features/multi_cloud.md @@ -0,0 +1,53 @@ +# Multi-Cloud + +Multi-Cloud BiBiGrid allows for an easy cluster creation and management across multiple clouds. +With this configuration slurm will span over all given clouds and NFS share will be accessible by every node independent of its cloud. +Due to the high level of abstraction (VPN), using BiBiGrid's multi-cloud clusters is no more difficult than BiBiGrid's single cloud cluster. +However, the [configuration](configuration.md#configuration-list) (which contains all relevant information for most users) +of course needs to contain two cloud definitions, and you need access to both clouds. +Due to BiBiGrid's cloud separation by partition, users can specifically address individual clouds. + +Slides briefly covering the development: [ELIXIR Compute 2023 -- Multi-Cloud - BiBiGrid.pdf](../../pdfs/ELIXIR%20Compute%202023%20--%20Multi-Cloud%20-%20BiBiGrid.pdf). + +What follows are implementation details that are not relevant for most users. + +## DNS Server +DNS is provided by [dnsmasq](../software/dnsmasq.md). All instances are added whether they are started once (master, vpngtw) +or on demand (workers). Explicitly, BiBiGrid manages adding workers to [dnsmasq](../software/dnsmasq.md) on creation +triggered by [create_server](../../../resources/playbook/roles/bibigrid/files/slurm/create_server.py) and executed by ansible +by task [003-dns.yml](../../../resources/playbook/roles/bibigrid/tasks/003-dns.yml). + +## VPN - Wireguard +[Wireguard](../software/wireguard.md) creates a VPN between all vpngtw and the master node. + +### Keypair +A single keypair (X25519 encrypted and base64 encoded) is [generated](../../../bibigrid/core/utility/wireguard/wireguard_keys.py) by BiBiGrid on cluster +creation and distributed via SSH to the master and every vpngtw. + +### Interface +Using systemd-network a persistent wg0 interface is [created by ansible](../../../resources/playbook/roles/bibigrid/tasks/002-wireguard-vpn.yml) +in order to enable Wireguard. + +## Port Security +Default, OpenStack prevents packages from **leaving** instances, if IP and MAC do not match. +This mismatch happens, because vpngtws are forwarding packages from master to remote workers and from workers back to master. +While forwarding the MAC Address changes, but the IP remains the IP of the worker/master. Therefore, IP and MAC mismatch. + +By adding `Allowed Address Pairs` OpenStack knows that it should allow those mismatch packages. +These `Allowed Address Pairs` are added to the master and to every vpngtw. + +For that Ansible creates [userdata](../../../resources/playbook/roles/bibigrid/tasks/042-slurm-server.yml) +files that are later [injected](../../../resources/playbook/roles/bibigrid/files/slurm/create_server.py) +into started worker instances by `create_server.py` triggered by slurm. + +## MTU Probing +MTU Probing is necessary, because MTU might be different across networks. [Ansible handles that]([created by ansible](../../../resources/playbook/roles/bibigrid/tasks/002-wireguard-vpn.yml). + +## IP Routes +In order to allow workers to communicate over the vpn, they need to know how to use it. +Therefore, IP routes are [set by ansible]([deactivated by ansible](../../../resources/playbook/roles/bibigrid/tasks/000-add-ip-routes.yml) +for the workers, telling them how to contact the master. + +## Deactivating Netplan +Netplan is [deactivated by ansible](../../../resources/playbook/roles/bibigrid/tasks/000-add-ip-routes.yml) +in order to avoid set ip routes being overwritten. \ No newline at end of file diff --git a/documentation/markdown/features/terminate_cluster.md b/documentation/markdown/features/terminate_cluster.md index a47eb289..775699cd 100644 --- a/documentation/markdown/features/terminate_cluster.md +++ b/documentation/markdown/features/terminate_cluster.md @@ -1 +1,20 @@ -# Terminate Cluster \ No newline at end of file +# Terminate Cluster +Terminates a cluster. Asks for confirmation if debug mode is active or no local keypair matching cluster id can be found. + +## Delete Local Keypairs +Local keypairs are deleted, because they are no longer needed after cluster termination. +Keypairs are stored in `~/.config/bibigrid/keys` + +## Terminate Servers +All servers from all clusters are deleted. + +## Delete Remote Keypairs +All remote keypairs are deleted. + +## Delete Application Credentials +Application credentials are deleted - if application credentials have been created. + +## Security Groups +Security Groups are deleted (default and wireguard (if multi-cloud)). +BiBiGrid will attempt security group deletion multiple times, because providers need a little time betweens server deletion and +security group deletion. \ No newline at end of file diff --git a/documentation/markdown/features/update.md b/documentation/markdown/features/update.md index 3e9ff9ec..40ea97dd 100644 --- a/documentation/markdown/features/update.md +++ b/documentation/markdown/features/update.md @@ -1 +1,5 @@ -# Update \ No newline at end of file +# Update + +Updates ansible-playbook and nothing else. You cannot declare new instances or anything. +Only relevant if a fix or a new feature is added to the ansible-playbook. +In the future we will try to further enhance this feature. \ No newline at end of file diff --git a/documentation/markdown/features/version.md b/documentation/markdown/features/version.md index e04a043b..c5ac0cd3 100644 --- a/documentation/markdown/features/version.md +++ b/documentation/markdown/features/version.md @@ -1 +1,3 @@ -# Version \ No newline at end of file +# Version + +Prints BiBiGrid's current version. \ No newline at end of file diff --git a/documentation/markdown/software/ansible.md b/documentation/markdown/software/ansible.md index f7e02ac8..dc73d8fb 100644 --- a/documentation/markdown/software/ansible.md +++ b/documentation/markdown/software/ansible.md @@ -1,39 +1,50 @@ # Ansible ## Ansible Tutorial + - [Ansible Workshop Presentation](https://docs.google.com/presentation/d/1W4jVHLT8dB1VsdtxXqtKlMqGbeyEWTQvSHh0WMfWo2c/edit#slide=id.p10) - [de.NBI Cloud's Ansible Course](https://gitlab.ub.uni-bielefeld.de/denbi/ansible-course) ## Executing BiBiGrid's Playbook Manually -Only execute BiBiGrid's playbook manually when no worker is up. The playbook is executed automatically for workers powering up. + +Only execute BiBiGrid's playbook manually when no worker is up. The playbook is executed automatically for workers +powering up. If you've implemented changes to BiBiGrid's playbook, you might want to execute BiBiGrid's playbook manually to see how those changes play out. For this we need the preinstalled `bibigrid-playbook` command. However, BiBiGrid has a handy shortcut for that called `bibiplay`. ### bibiplay + To make things easier we wrote the [bibiplay](..%2F..%2F..%2Fresources%2Fbin%2Fbibiplay) wrapper. It's used like this: + ```sh bibiplay ``` + is the same as: + ```sh ansible-playbook /opt/playbook/site.yml /opt/playbook/ansible_hosts/ ``` + any additional arguments are passed to `ansible-playbook`: + ```sh bibiplay -l master ``` + is the same as: + ```sh ansible-playbook /opt/playbook/site.yml /opt/playbook/ansible_hosts/ -l master ``` ### Useful commands -For more options see [ansible-playbook's manpage](https://linux.die.net/man/1/ansible-playbook). +For more options see [ansible-playbook's manpage](https://linux.die.net/man/1/ansible-playbook). -| Summary | Command | -|:----------------------------------------------------------------:|:-----------------------------:| -| Prepare master manually | `bibiplay -l master` | -| Prepare only slurm on master manually | `bibiplay -l master -t slurm` | +| Summary | Command | +|:-------------------------------------:|:-----------------------------:| +| Prepare master manually | `bibiplay -l master` | +| Prepare only slurm on master manually | `bibiplay -l master -t slurm` | diff --git a/documentation/markdown/software/dnsmasq.md b/documentation/markdown/software/dnsmasq.md new file mode 100644 index 00000000..73ae7a33 --- /dev/null +++ b/documentation/markdown/software/dnsmasq.md @@ -0,0 +1,4 @@ +# Dnsmasq + +Using dnsmasq became necessary due to BiBiGrid's multi-cloud development. +However, managing dns by BiBiGrid comes with more advantages, because now BiBiGrid doesn't depend on clouds' DNS configuration. \ No newline at end of file diff --git a/documentation/markdown/software/nfs.md b/documentation/markdown/software/nfs.md index 9e47c349..ffdc6b40 100644 --- a/documentation/markdown/software/nfs.md +++ b/documentation/markdown/software/nfs.md @@ -1,58 +1,71 @@ # Network File System (NFS) + NFS is used as an abstraction layer to allow users to work naturally across file systems. In a cluster setup working across file systems is really important when multiple nodes work on the same data. Most BiBiGrid users will never really interact consciously with BiBiGrid's underlying NFS, but simply use it. ## How To Configure NFS Shares? -When starting an ansible cluster, at least `/vol/spool` is initialised as an NFS share if the key + +When starting an ansible cluster, at least `/vol/spool` is initialised as an NFS share if the key [nfs](../features/configuration.md#nfs--optional-) is `True`. -Further NFS shares can then be configured using configuration's +Further NFS shares can then be configured using configuration's [nfsshares](../features/configuration.md#nfsshares--optional-) key. + ### Manually Creating NFS Shares -We discourage bypassing BiBiGrid's [configuration](../features/configuration.md#nfsshares--optional-) by creating + +We discourage bypassing BiBiGrid's [configuration](../features/configuration.md#nfsshares--optional-) by creating additional NFS shares manually, because they will not be automatically registered by scheduled workers. ## Useful Commands -| Summary | Command | Explanation & Comment | -|:----------------------:|:----------------------:|:------------------------------------------------------:| -| List all nfs shares | `showmount --exports` | | +| Summary | Command | Explanation & Comment | +|:-------------------:|:---------------------:|:---------------------:| +| List all nfs shares | `showmount --exports` | | ### NFS commands + See [nfs' manpage](https://man7.org/linux/man-pages/man5/nfs.5.html). ## How To Share an Attached Volume + By mounting a volume into a shared directory, volumes can be shared. ### Configuration + Let's assume our configuration holds (among others) the keys: + ```yml nfs: True masterMounts: - - testMount + - testMount nfsShares: - - testShare + - testShare ``` Where `testMount` is an existing, formatted volume with a filesystem type (for example: ext4, ext3, ntfs, ...). During cluster creation... + 1. BiBiGrid sets up the nfsShare `/testShare`. 2. BiBiGrid attached the volume `testMount` to our master instance. The volume is not mounted yet. 3. We call the cluster `bibigrid-master-ournfsclusterid` in the following. ### Mounting a Volume Into a Shared Directory + In order to mount a volume into a shared directory, we first need to identify where our volume was attached. -#### Find Where Volume was attached +#### Find Where Volume Has Been Attached + Executing this openstack client command will give us a list of volumes. Most likely it is best run from your local machine. + ```sh openstack volume list --os-cloud=openstack ``` + Result: | ID | Name | Status | Size | Attached to | @@ -61,9 +74,10 @@ Result: | 42424242-4242-4242-4242-424242424221 | testMount | in-use | Y | Attached to bibigrid-master-ournfsclusterid on /dev/vdd | As you can see, the volume `testMount` was attached to `/dev/vdd`. -We can double-check whether `/dev/vdd` really exists by executing `lsblk` or `lsblk | grep /dev/vdd` on the master. +We can double-check whether `/dev/vdd` really exists by executing `lsblk` or `lsblk | grep /dev/vdd` on the master. #### Mount Volume into Share + As our NFS share is `/testShare`, we now need to mount `dev/vdd` into `testShare`: ```sh diff --git a/documentation/markdown/software/slurm.md b/documentation/markdown/software/slurm.md index ac1bd165..ff709b59 100644 --- a/documentation/markdown/software/slurm.md +++ b/documentation/markdown/software/slurm.md @@ -1,9 +1,9 @@ # Slurm -Be aware that due to BiBiGrid's slurm configuration the default behavior of commands might differ slightly from slurm's defaults. -Everything described below explains how slurm will behave in BiBiGrid's context. ## Slurm Client + ### Useful commands + For more options see [slurm client's manpage](https://manpages.debian.org/testing/slurm-client/slurm-wlm.1). | Summary | Command | Explanation & Comment | @@ -19,18 +19,19 @@ For more options see [slurm client's manpage](https://manpages.debian.org/testin |:---------------------------------------------------------------------------------:|:--------------------------------------------:| | [NODE STATE CODES](https://slurm.schedmd.com/sinfo.html#SECTION_NODE-STATE-CODES) | Very helpful to interpret `sinfo` correctly. | - ## REST API -BiBiGrids configures Slurm's REST API Daemon listening on `0.0.0.0:6420`. +BiBiGrids configures Slurm's REST API Daemon listening on `0.0.0.0:6820`. Get token for user slurm + ```shell -$ scontrol token -u slurm +$ sudo scontrol token username=slurm [lifespan=