Cloud migration April 2024 #195

sanjaysrikakulam · 2024-04-26T16:26:52Z

This PR:

adds the new state file based on the deployments in the new cloud
disables the outputs of upload instance as the instance is not currently deployed in the new cloud
adds new security groups, update the existing ones, and migrate whatever was still in use in the old cloud
removes empty files
commented out/disabled some user VMs temporarily as they are still running in the old cloud for user data backup. Once the users complete their backups, we can remove them in the old cloud, uncomment it here, and re-provision them.
Disables beacon resources temporarily as they are still running in the old cloud. We need to migrate them to the new cloud. They are currently commented out, so the VMs won't be spawned in the new cloud until they are ready for migration.
Remove some of the user VMs. These are currently only in the old cloud. The users have confirmed that their VMs can be removed, so we do not need their config files in the new cloud.
Create a new plausible Terraform config file and use the snapshot as the image for the VM. Snapshots were created and moved from the old cloud to the new cloud. Similarly, the DNS resource conf was moved from dns.tf file
Disabling/commenting on both mq and uploading instance resources as they might likely be moved to KVM. These VMs are still running in the old cloud.
remove unused DNS CNAME records from the maintenance host
disable/comment out bronze and silver workers as they are no longer needed. The silver worker in the old cloud is still running for the time being, and since the new cloud-only supports V3 block storage, updated the resource in the gold worker TF file
I moved the FTP DNS record from the dns.tf file to the instance TF file itself.
For dokku, influxdb, and stats snapshots from the old cloud were used as the image, and also reattached the same volume from the old cloud was used. Since reattaching to the same volume from the old cloud, block volume creation was commented out, and the new cloud-only supports v3 block volume from now on
For CVMFS, stratum 0 and 1, snapshots from the old cloud were used as the image and also reattached to the same volume. Since reattaching to the same volume from the old cloud, block volume creation was commented out, and the new cloud-only supports v3 block volume from now on
Change the flavor for celery. Bjoern suggested that this should be fine as celery is more IO-bound than CPU/Mem.
Create a new apollo Terraform config file and use the snapshot from the old cloud as the image of the VM in the new cloud.
Add allow_overwrite to some DNS resources and move plausible, apollo, and ftp DNS resources to its own compute instance files

Snapshots:
I created snapshots of the VMs, downloaded them, and uploaded them to the new cloud. Once the upload was done, a property (--property hw_video_model=cirrus) was added to each of the images. Due to some config changes in the new cloud, this property is required for all the images that we upload to the new cloud. Manuel said he would investigate this.

How to add a property to an uploaded image

openstack image set --property hw_video_model=cirrus <image_name>

Volumes:

Manually detach the volume from the VM in the old cloud
Create a snapshot if needed
Attach the same volume to the VM spawned in the new cloud (make sure to set the device path in the TF files and comment out the block volume resource creation. Also, keep in mind that only the block volume V3 is supported in the new cloud)

OS Images:
I downloaded the OS images currently used from the old cloud, uploaded them to the new cloud, and set the property mentioned above to the images.

I encountered various issues when snapshotting and reattaching volumes, but I don't know what or how to document the troubleshooting.

Ref: https://github.com/usegalaxy-eu/issues/issues/533

and move plausible, apollo, and ftp DNS resources to its own compute instance files

… image of the VM snapshots were created and moved from the old cloud to the new cloud

Bjoern suggested that this should be fine as celery is more IO bound than CPU/Mem.

…same volume since reattaching to the same volume from the old cloud, block volume creation was commented out and the new cloud only supports v3 block volume from now on

…itself

the silver worker from the old cloud is still running for the time being and since the new cloud only supports V3 block storage, update the resource in the gold worker

… moved to KVM. These VMs are still running in the old cloud

…the image of the VM snapshots were created and moved from the old cloud to the new cloud. Similarly the DNS resource conf was moved from dns.tf file

These are currently only in the old cloud. The users have confirmed that their VMs can be removed, so we do not need their config files in the new cloud.

… old cloud. commented out so the VMs won't be spawned in the new cloud until ready for migration.

…d cloud for user data backup Once the users complete their backups we can remove them in the old cloud and uncomment it here and re-provision them.

migrated whatever was still in use in the old cloud

…ly in the new cloud

sanjaysrikakulam · 2024-04-26T16:29:37Z

I have now added all the deployments I have done manually so far to this PR to keep track of everything.

More changes are yet to come as we progress with the migration.

sanjaysrikakulam · 2024-04-26T16:32:42Z

New cloud credentials (clouds.yaml) were created and added to the Jenkins credentials. The Jenkins projects need to be reconfigured to use the new credentials and they are currently disabled as all the changes are custom and are being deployed locally.

Ref: usegalaxy-eu#196

sanjaysrikakulam · 2024-05-21T13:11:49Z

I have added an openrc credential (new cloud; user: freiburg_galaxy (service user account)) to our Jenkins and reconfigured the infrastructure , infrastructure_pr, vgcn-generic-internal, vgcn-worker-gpu-internal, and vgcn-workers-internal Jenkins projects to use the new cloud credentials.

sanjaysrikakulam · 2024-05-21T13:13:03Z

Does anyone want to have a look at this PR?

mira-miracoli

I haven't checked the terraform.tfstate fully, but I can do if you think it is better to be checked
The "ingress-from-proxy" secgroup was for flower, but we are accessing this now via tailscale, which is more secure and we don't have to manage login credentials.

Otherwise everything looks fine to me. Thank you, this looks like a lot of work!

instance_core_stats.tf

secgroup_ingress-from-proxy.tf

secgroup_ufr-ingress.tf

secgroup_ingress-from-proxy.tf

secgroup_interactive_egress.tf

Co-authored-by: Mira <[email protected]>

sanjaysrikakulam · 2024-05-22T12:16:17Z

Thank you, @mira-miracoli, for reviewing this. :)

sanjaysrikakulam added 17 commits April 26, 2024 17:35

Add allow_overwrite to some DNS resources

e162d8d

and move plausible, apollo, and ftp DNS resources to its own compute instance files

Create a new apollo Terraform config file and use the snapshot as the…

268f6b4

… image of the VM snapshots were created and moved from the old cloud to the new cloud

Change the flavor

0b67a75

Bjoern suggested that this should be fine as celery is more IO bound than CPU/Mem.

use snapshots from the old cloud as the image and reattach it to the …

3da48a3

…same volume since reattaching to the same volume from the old cloud, block volume creation was commented out and the new cloud only supports v3 block volume from now on

use snapshots from the old cloud as the image and reattach it to the …

dc31f34

…same volume since reattaching to the same volume from the old cloud, block volume creation was commented out and the new cloud only supports v3 block volume from now on

Move the ftp DNS record from the dns.tf file to the instance TF file …

b931060

…itself

disable both bronze and silver workers as they are not needed anymore

2722996

the silver worker from the old cloud is still running for the time being and since the new cloud only supports V3 block storage, update the resource in the gold worker

remove unused DNS CNAME record from the maintenance host

429a565

disable both mq and upload instance resources as they might likely be…

1e44f7f

… moved to KVM. These VMs are still running in the old cloud

Create a new plausible Terraform config file and use the snapshot as …

e8fe283

…the image of the VM snapshots were created and moved from the old cloud to the new cloud. Similarly the DNS resource conf was moved from dns.tf file

remove the following user vms

59fc8f0

These are currently only in the old cloud. The users have confirmed that their VMs can be removed, so we do not need their config files in the new cloud.

disable beacon resources temporarily as they are still running in the…

bb6402c

… old cloud. commented out so the VMs won't be spawned in the new cloud until ready for migration.

disable some user VMs temporarily as they are still running in the ol…

72555ce

…d cloud for user data backup Once the users complete their backups we can remove them in the old cloud and uncomment it here and re-provision them.

remove empty files

1d0c10a

add new security groups and update the existing ones

cc7f779

migrated whatever was still in use in the old cloud

disable the outputs of upload instance as the instance is not current…

222cac6

…ly in the new cloud

add the new state file based on the deployments in the new cloud

2399eab

sanjaysrikakulam requested review from bgruening and mira-miracoli April 26, 2024 16:26

Change the image of apps/dokku VM to Ubuntu 22.04; deployed manually.

589cc31

Ref: usegalaxy-eu#196

sanjaysrikakulam marked this pull request as ready for review May 21, 2024 13:12

sanjaysrikakulam mentioned this pull request May 21, 2024

Fix jenkins GitHub #197

Closed

mira-miracoli approved these changes May 21, 2024

View reviewed changes

Add new line to secgroup_ufr-ingress.tf

c1a0112

Co-authored-by: Mira <[email protected]>

sanjaysrikakulam merged commit fac482e into usegalaxy-eu:main May 22, 2024
1 check passed

sanjaysrikakulam mentioned this pull request Sep 20, 2024

Remove student hollers VM conf #216

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Cloud migration April 2024 #195

Cloud migration April 2024 #195

sanjaysrikakulam commented Apr 26, 2024 •

edited

Loading

sanjaysrikakulam commented Apr 26, 2024

sanjaysrikakulam commented Apr 26, 2024

sanjaysrikakulam commented May 21, 2024 •

edited

Loading

sanjaysrikakulam commented May 21, 2024

mira-miracoli left a comment •

edited

Loading

sanjaysrikakulam commented May 22, 2024

Cloud migration April 2024 #195

Cloud migration April 2024 #195

Conversation

sanjaysrikakulam commented Apr 26, 2024 • edited Loading

sanjaysrikakulam commented Apr 26, 2024

sanjaysrikakulam commented Apr 26, 2024

sanjaysrikakulam commented May 21, 2024 • edited Loading

sanjaysrikakulam commented May 21, 2024

mira-miracoli left a comment • edited Loading

Choose a reason for hiding this comment

sanjaysrikakulam commented May 22, 2024

sanjaysrikakulam commented Apr 26, 2024 •

edited

Loading

sanjaysrikakulam commented May 21, 2024 •

edited

Loading

mira-miracoli left a comment •

edited

Loading