diff --git a/docs/img/AROMonitor.png b/docs/img/AROMonitor.png new file mode 100644 index 00000000000..a4c74a0d3cc Binary files /dev/null and b/docs/img/AROMonitor.png differ diff --git a/docs/img/SOCATConnection.png b/docs/img/SOCATConnection.png new file mode 100644 index 00000000000..2caeb16c11a Binary files /dev/null and b/docs/img/SOCATConnection.png differ diff --git a/docs/unit-testing-for-monitoring-metrics.md b/docs/unit-testing-for-monitoring-metrics.md new file mode 100644 index 00000000000..0943262b9dd --- /dev/null +++ b/docs/unit-testing-for-monitoring-metrics.md @@ -0,0 +1,71 @@ + +# Testing ARO Monitor Metrics + + + +## The Monitor Architecture + +The ARO monitor component (the part of the aro binary you activate when you execute ./cmd/aro monitor) collects and emits the various metrics about cluster health (and its own) we want to see in Geneva. + +To send data to Geneva the monitor uses an instance of a Geneva MDM container as a proxy of the Geneva API. The MDM container accepts statsd formatted data (the Azure Geneva version of statsd, that is) over a UNIX (Domain) socket. The MDM container then forwards the metric data over a https link to the Geneva API. Please note that using a Unix socket can only be accessed from then same machine. + +The monitor picks the required information about which clusters should actualyl monitor from its corresponding Cosmos DB. If multiple monitor instances run in parallel (i.e. connect to the same database instance) as is the case in production, they negotiate which instance monitors what cluster (see : [monitoring.md](./monitoring.md)). + + +![Aro Monitor Architecture](img/AROMonitor.png "Aro Monitor Architecture") + + +## Unit Testing Setup + +There are two ways set up: +- Run the Geneva container locally. +- Spawn a VM, start the Geneva container there and connect/tunnel to it. + +### Local container setup + +An example docker command to start the container locally is here (you will need to adapt some parameters): +[Example](../hack/local-monitor-testing/sample/dockerStartCommand.sh) + +Two things to adapt: +* Amongst other things container needs to be provided with the Geneva key and certificate. For the INT instance that is the rp-metrics-int.pem you find in the secrets folder after running `make secrets`. Copy that to /etc/mdm.pem or adapt the volume mount accordingly. The mdm container logs will tell you of that worked or not. +* When you start the montitor locally in local dev mode, the monitor looks for the Unix Socket file mdm_statsd.socket in the current path (usually ./cmd/aro folder) . Adapt the path in the start command accordingly. + +### Remote container setup + +If you can't run the container locally (because you run on macOS and you container tooling does not support Unix Sockets, which is true both for Docker for Desktop or podman) and or don't want to, you can bring up the container on a Linux VM and connect via a socat/ssh chain: +![alt text](img/SOCATConnection.png "SOCAT chain") + +The [deploy script](../hack/local-monitor-testing/deploy_MDM_VM.sh) deploys such a VM on Azure (if you ./env things properly), configures it and installs the container. + +The [start script](../hack/local-monitor-testing/startMDMNetwork.sh) can then be used to established the network connection as depicted in the diagram. For local VMs you may want to skip the ssh tunnel step. + + +### Starting the monitor + +When starting the monitor , make sure to have your + +- CLUSTER_MDM_ACCOUNT +- CLUSTER_MDM_NAMESPACE + +environment variables set to Geneva account and namespace where you metrics is supposed to land in Geneva INT (https://jarvis-west-int.cloudapp.net/) + +Use `go run -tags aro ./cmd/aro monitor` to start the monitor. You want to check what the current directory of your monitor is, because that's the folder the monitor will use to search for the mdm_statds.socket file, which needs to match where your mdm container or the socat command creates it. + +A VS Code launch config that does the same would look like. + +```` +{ + "name": "Launch Monitor", + "type": "go", + "request": "launch", + "mode": "auto", + "program": "./cmd/aro", + "buildFlags": "-tags aro", + "console": "integratedTerminal", + "args": ["-loglevel=debug", + "monitor", + ], + "env": {"CLUSTER_MDM_ACCOUNT": "", + "CLUSTER_MDM_NAMESPACE":"" } + }, +```` \ No newline at end of file diff --git a/hack/local-monitor-testing/configureRemote.sh b/hack/local-monitor-testing/configureRemote.sh new file mode 100755 index 00000000000..d102ccc884b --- /dev/null +++ b/hack/local-monitor-testing/configureRemote.sh @@ -0,0 +1,36 @@ +# Setup the VM +rpm --import https://dl.fedoraproject.org/pub/epel/RPM-GPG-KEY-EPEL-7 +rpm --import https://packages.microsoft.com/keys/microsoft.asc + +yum -y install https://dl.fedoraproject.org/pub/epel/epel-release-latest-7.noarch.rpm + +cat >/etc/yum.repos.d/azure.repo <<'EOF' +[azure-cli] +name=azure-cli +baseurl=https://packages.microsoft.com/yumrepos/azure-cli +enabled=yes +gpgcheck=yes +EOF + +yum --enablerepo=rhui-rhel-7-server-rhui-optional-rpms -y install \ + azure-cli \ + docker \ + jq \ + gcc \ + rh-git29 \ + rh-python36 \ + tmpwatch \ + lttng-usr \ + gpgme-devel \ + libassuan-devel \ + socat + + +sed -i -e 's/^OPTIONS='\''/OPTIONS='\''-G cloud-user /' /etc/sysconfig/docker + +systemctl enable docker +systemctl restart docker + + + + diff --git a/hack/local-monitor-testing/deploy_MDM_VM.sh b/hack/local-monitor-testing/deploy_MDM_VM.sh new file mode 100755 index 00000000000..07350dd98e6 --- /dev/null +++ b/hack/local-monitor-testing/deploy_MDM_VM.sh @@ -0,0 +1,92 @@ +#!/bin/bash -e +set +x + +BASE=$( git rev-parse --show-toplevel) + +HOSTNAME=$( hostname ) +NAME="mdm" +MDMIMAGE=linuxgeneva-microsoft.azurecr.io/genevamdm:master_20211120.1 +MDMFRONTENDURL=https://int2.int.microsoftmetrics.com/ +MDMSOURCEENVIRONMENT=$LOCATION +MDMSOURCEROLE=rp +MDMSOURCEROLEINSTANCE=$HOSTNAME + +echo "Using:" + +echo "Resourcegroup = $RESOURCEGROUP" +echo "User = $USER" +echo "HOSTNAME = $HOSTNAME" +echo "Containername = $NAME" +echo "Location = $LOCATION" +echo "MDM image = $MDMIMAGE" +echo " (version hardcoded. Check against pkg/util/version/const.go if things don't work)" +echo "Geneva API URL= $MDMFRONTENDURL" +echo "MDMSOURCEENV = $MDMSOURCEENVIRONMENT" +echo "MDMSOURCEROLE = $MDMSOURCEROLE" +echo "MDMSOURCEROLEINSTANCE = $MDMSOURCEROLEINSTANCE" + +VMName="$USER-mdm-link" + +CLOUDUSER="cloud-user" + + + +if [ "$(az vm show -g $RESOURCEGROUP --name $VMName)" = "" ]; +then + echo "Creating VM $VMName in RG $RESOURCEGROUP" + az vm create -g $RESOURCEGROUP -n $VMName --image RedHat:RHEL:7-LVM:latest --ssh-key-values @~/.ssh/id_rsa.pub --admin-username $CLOUDUSER +else + echo "VM already exists, skipping..." +fi + + +PUBLICIP=$( az vm list-ip-addresses --name $VMName -g $RESOURCEGROUP | jq -r '.[0].virtualMachine.network.publicIpAddresses[0].ipAddress' ) + +echo "Found IP $PUBLICIP" + +scp $BASE/secrets/rp-metrics-int.pem $CLOUDUSER@$PUBLICIP:mdm.pem +scp $BASE/hack/local-monitor-testing-MACOS/2configureRemote.sh $CLOUDUSER@$PUBLICIP: + +ssh $CLOUDUSER@$PUBLICIP "sudo cp mdm.pem /etc/mdm.pem" +ssh $CLOUDUSER@$PUBLICIP "sudo ./configureRemote.sh" + + +ssh $CLOUDUSER@$PUBLICIP "sudo docker pull $MDMIMAGE" + +cat < $BASE/dockerStartCommand.sh +docker run \ + --entrypoint /usr/sbin/MetricsExtension \ + --hostname $HOSTNAME \ + --name $NAME \ + -d \ + --restart=always \ + -m 2g \ + -v /etc/mdm.pem:/etc/mdm.pem \ + -v /var/etw:/var/etw:z \ + $MDMIMAGE \ + -CertFile /etc/mdm.pem \ + -FrontEndUrl $MDMFRONTENDURL \ + -Logger Console \ + -LogLevel Warning \ + -PrivateKeyFile /etc/mdm.pem \ + -SourceEnvironment $MDMSOURCEENVIRONMENT \ + -SourceRole $MDMSOURCEROLE \ + -SourceRoleInstance $MDMSOURCEROLEINSTANCE +EOF + + +#disable SELINUX (don't shoot me) +ssh $CLOUDUSER@$PUBLICIP "sudo setenforce 0" +ssh $CLOUDUSER@$PUBLICIP "sudo getenforce" + +#make it permanent +ssh $CLOUDUSER@$PUBLICIP "sudo sed -i 's/SELINUX=enforcing/SELINUX=permissive/g' /etc/selinux/config" + + +ssh $CLOUDUSER@$PUBLICIP "sudo firewall-cmd --zone=public --add-port=12345/tcp --permanent" +ssh $CLOUDUSER@$PUBLICIP "sudo firewall-cmd --reload" + + +scp $BASE/dockerStartCommand.sh $CLOUDUSER@$PUBLICIP: +ssh $CLOUDUSER@$PUBLICIP "chmod +x dockerStartCommand.sh" +ssh $CLOUDUSER@$PUBLICIP "sudo ./dockerStartCommand.sh &" diff --git a/hack/local-monitor-testing/sample/dockerStartCommand.sh b/hack/local-monitor-testing/sample/dockerStartCommand.sh new file mode 100644 index 00000000000..aab29bbc5d3 --- /dev/null +++ b/hack/local-monitor-testing/sample/dockerStartCommand.sh @@ -0,0 +1,52 @@ + + +BASE=$( git rev-parse --show-toplevel) + +SOCKETPATH="$BASE/cmd/aro" + +HOSTNAME=$( hostname ) +NAME="mdm" +MDMIMAGE=linuxgeneva-microsoft.azurecr.io/genevamdm:master_20211120.1 +MDMFRONTENDURL=https://int2.int.microsoftmetrics.com/ +MDMSOURCEENVIRONMENT=$LOCATION +MDMSOURCEROLE=rp +MDMSOURCEROLEINSTANCE=$HOSTNAME + + +echo "Using:" + +echo "Resourcegroup = $RESOURCEGROUP" +echo "User = $USER" +echo "HOSTNAME = $HOSTNAME" +echo "Containername = $NAME" +echo "Location = $LOCATION" +echo "MDM image = $MDMIMAGE" +echo " (version hardcoded. Check against pkg/util/version/const.go if things don't work)" +echo "Geneva API URL= $MDMFRONTENDURL" +echo "MDMSOURCEENV = $MDMSOURCEENVIRONMENT" +echo "MDMSOURCEROLE = $MDMSOURCEROLE" +echo "MDMSOURCEROLEINSTANCE = $MDMSOURCEROLEINSTANCE" + +cp $BASE/secrets/rp-metrics-int.pem /etc/mdm.pem + + + + +podman run \ + --entrypoint /usr/sbin/MetricsExtension \ + --hostname $HOSTNAME \ + --name $NAME \ + -d \ + --restart=always \ + -m 2g \ + -v /etc/mdm.pem:/etc/mdm.pem \ + -v $SOCKETPATH:/var/etw:z \ + $MDMIMAGE \ + -CertFile /etc/mdm.pem \ + -FrontEndUrl $MDMFRONTENDURL \ + -Logger Console \ + -LogLevel Debug \ + -PrivateKeyFile /etc/mdm.pem \ + -SourceEnvironment $MDMSOURCEENVIRONMENT \ + -SourceRole $MDMSOURCEROLE \ + -SourceRoleInstance $MDMSOURCEROLEINSTANCE \ No newline at end of file diff --git a/hack/local-monitor-testing/startMDMNetwork.sh b/hack/local-monitor-testing/startMDMNetwork.sh new file mode 100755 index 00000000000..6a7387c78f0 --- /dev/null +++ b/hack/local-monitor-testing/startMDMNetwork.sh @@ -0,0 +1,32 @@ +#!/bin/bash -e +set +x + +BASE=$( git rev-parse --show-toplevel) +SOCKETFILE="$BASE/cmd/aro/mdm_statsd.socket" + +echo "Using:" + +echo "Resourcegroup = $RESOURCEGROUP" +echo "User = $USER" + +VMName="$USER-mdm-link" +CLOUDUSER="cloud-user" + + +PUBLICIP=$( az vm list-ip-addresses --name $VMName -g $RESOURCEGROUP | jq -r '.[0].virtualMachine.network.publicIpAddresses[0].ipAddress' ) + +echo "Found IP $PUBLICIP, starting socat on the mdm-link vm" +ssh $CLOUDUSER@$PUBLICIP "sudo socat -v TCP-LISTEN:12345,fork UNIX-CONNECT:/var/etw/mdm_statsd.socket" & +sleep 3 + +echo "Starting SSH Tunnel" +ssh $CLOUDUSER@$PUBLICIP -N -L 12345:127.0.0.1:12345 & +sleep 3 + +if [ -f "$SOCKETFILE" ] ; then + rm "$SOCKETFILE" +fi +echo "Starting local socat link to the tunnel" +socat -v UNIX-LISTEN:$SOCKETFILE,fork TCP-CONNECT:127.0.0.1:12345 & + +