diff --git a/notes/millingw/20240229-01-malcolm-handover-notes.txt b/notes/millingw/20240229-01-malcolm-handover-notes.txt new file mode 100644 index 00000000..82cf1535 --- /dev/null +++ b/notes/millingw/20240229-01-malcolm-handover-notes.txt @@ -0,0 +1,157 @@ +Cient VM Details: +Vanilla Ubuntu VM on EIDF cloud (hereafter refered to as 'the client VM') +Authorisation issues encounted when using podman with normal EIDF freeipa - authenticated account. +To get round this, created local unix user malcolm, with sudo access +Used local user malcolm for all subsequent deployment. + +Reserve development cluster using shared document at https://github.com/wfau/gaia-dmp/wiki/Cloud-assignments + +Github setup: +Followed guide at https://gist.github.com/xirixiz/b6b0c6f4917ce17a90e00f9b60566278 +Private, public keys stored on client VM under malcolm account's .ssh directory + +Created own millingw fork of gaia-dmp via github gui +On client VM, clone fork into ~malcolm +cd ~malcolm +git clone git@github.com:millingw/gaia-dmp.git + +Logged into arcus horizon interface. +Created application credentials for blue, data +Warning! horizon lists credentials for all projects, although each credential only applies to a specific project. +Name credentials clearly, blue_credential, data_credential - lost time unnecessarily due to not realising I was overwriting credentials + +Broadly followed (with a few caveats) notes at https://github.com/wfau/gaia-dmp/blob/c978fd081ed4d3f25367155e9d31ef297512a1c8/notes/zrq/20240207-02-test-deploy.txt + +Contents of aglais.env in ~malcolm on client VM + +AGLAIS_REPO='git@github.com:millingw/gaia-dmp.git' +AGLAIS_CODE="${HOME}/gaia-dmp" +PATH="${PATH}:${AGLAIS_CODE}/bin" +export PATH + +Contents of clouds.yaml in ~malcolm + +clouds: + iris-gaia-blue: + auth: + auth_url: https://arcus.openstack.hpc.cam.ac.uk:5000 + application_credential_id: "4515530e7cea40f89a4c057f658fe8bc" + application_credential_secret: <****> + region_name: "RegionOne" + interface: "public" + identity_api_version: 3 + auth_type: "v3applicationcredential" + + iris-gaia-data: + auth: + auth_url: https://arcus.openstack.hpc.cam.ac.uk:5000 + application_credential_id: "4825e2f9d651464b8e75d1731447e313" + application_credential_secret: <*******> + region_name: "RegionOne" + interface: "public" + identity_api_version: 3 + auth_type: "v3applicationcredential" + + +Warning: +Have to run ssh-agent commands each time opening a new terminal on the client VM! +Perhaps due to some weirdness with the VM authentication ... + +Followed client software installation + +source "${HOME:?}/aglais.env" + +check $AGLAIS_CODE does point to where we have gaia-dmp checked out + +sanity check live host - don't proceed if it appears we are targeting live host! + +sourcing aglais.env should put all scripts on the path, if not then something has gone wrong + +create the client, first time may take a little while as things are downloaded / cached + +ansi-client 'blue' +(following instructions all assume executing inside ansi-client) +Warning: on the client VM, exiting ansi-client leaves a listener hogging port 3000, need to manually kill before restarting the client) + + + +run the deploy - this does a complete teardown and rebuild of the targeted environment. can take a while. +source /deployments/hadoop-yarn/bin/deploy.sh + +Note: this went through a period of not working for some days, appearing like internal routing errors between the different arcus hardware nodes +If the zepellin jars fail to be downloaded then the problem has reappeared +Note: the arcus metadata server can have a bad day, in which case errors may appear in deleting / creating ceph mounts ... + +Import the test users and run tests: + +source /deployments/admin/bin/create-user-tools.sh +import-test-users + +This creates a set of canned users on the cluster, with ceph volumes and example notebooks +any issues with ceph will show up at this point + +(Could have followed instructions to run benchmarks, but didn't due to versioning issue causing known errors) + +If we got here, things are looking ok + +enable the user creation tools, and start to create our known users + +source /deployments/admin/bin/create-user-tools.sh +import-live-users + +live users are created, but only in this instance (ie red, green unaffected) +verbose output file created in /tmp/live-users.json +various tools to inspect the users and the unix / ceph resources + +list-username /tmp/live-users.json +list-usernames /tmp/live-users.json +less /tmp/live-users.json +list-linux-info /tmp/live-users.json +list-shiro-info /tmp/live-users.json +list-shiro-full /tmp/live-users.json +list-ceph-info /tmp/live-users.json +list-note-copy /tmp/live-users.json + +To add a new user: +In a new terminal on the client VM, cd to /home/malcolm/gaia-dmp/deployments/common/users +Followed Dave's notes to create branch on millingw fork +https://github.com/wfau/gaia-dmp/blob/master/notes/zrq/20240212-01-git-branch.txt + +edit live-users.yml file +add a new user at the end, e.g. (pick next sequential linuxuid) +- name: "MIllingworth" + type: "live" + linuxuid: 10032 + +save, got back to terminal running ansi-client and rerun import-live-users +need to watch output, this is the only opportunity to see what password has been set for the new user +assuming successful, go to other terminal, commit the branch and do a pull request into main + +at this point, user is live on our node but not on live system +need the user's passhass +in the ansi-client window, display the new user's details +list-shiro-full /tmp/live-users.json +copy the passhash field for the new user at the end of the output + +now we add the details to the live system +ssh to the controller on data +ssh fedora@data.gaia-dmp.uk +edit the file 'passhashes' to add the user's accountname and passhash copied from the list-shiro-full output + +Finally, send introductory email to the user, following form of https://github.com/wfau/gaia-dmp/blob/master/docs/emails/welcome-email.txt and including new password from output above + +To remove a rogue user: +ssh fedora@data.gaia-dmp.uk +connect to the mysql instance and remove the passhash row containing the user's name +remove (or preferably comment out with note) the user from the live users file +rerun create live users + + + + + + + + + +