Skip to content

Commit

Permalink
Add Documentation and Scripts for ARO Monitor Metric testing
Browse files Browse the repository at this point in the history
  • Loading branch information
Ulrich Schlueter authored and m1kola committed Jan 28, 2022
1 parent 336659c commit 1c0a5b3
Show file tree
Hide file tree
Showing 7 changed files with 283 additions and 0 deletions.
Binary file added docs/img/AROMonitor.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added docs/img/SOCATConnection.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
71 changes: 71 additions & 0 deletions docs/unit-testing-for-monitoring-metrics.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,71 @@

# Testing ARO Monitor Metrics



## The Monitor Architecture

The ARO monitor component (the part of the aro binary you activate when you execute ./cmd/aro monitor) collects and emits the various metrics about cluster health (and its own) we want to see in Geneva.

To send data to Geneva the monitor uses an instance of a Geneva MDM container as a proxy of the Geneva API. The MDM container accepts statsd formatted data (the Azure Geneva version of statsd, that is) over a UNIX (Domain) socket. The MDM container then forwards the metric data over a https link to the Geneva API. Please note that using a Unix socket can only be accessed from then same machine.

The monitor picks the required information about which clusters should actualyl monitor from its corresponding Cosmos DB. If multiple monitor instances run in parallel (i.e. connect to the same database instance) as is the case in production, they negotiate which instance monitors what cluster (see : [monitoring.md](./monitoring.md)).


![Aro Monitor Architecture](img/AROMonitor.png "Aro Monitor Architecture")


## Unit Testing Setup

There are two ways set up:
- Run the Geneva container locally.
- Spawn a VM, start the Geneva container there and connect/tunnel to it.

### Local container setup

An example docker command to start the container locally is here (you will need to adapt some parameters):
[Example](../hack/local-monitor-testing/sample/dockerStartCommand.sh)

Two things to adapt:
* Amongst other things container needs to be provided with the Geneva key and certificate. For the INT instance that is the rp-metrics-int.pem you find in the secrets folder after running `make secrets`. Copy that to /etc/mdm.pem or adapt the volume mount accordingly. The mdm container logs will tell you of that worked or not.
* When you start the montitor locally in local dev mode, the monitor looks for the Unix Socket file mdm_statsd.socket in the current path (usually ./cmd/aro folder) . Adapt the path in the start command accordingly.

### Remote container setup

If you can't run the container locally (because you run on macOS and you container tooling does not support Unix Sockets, which is true both for Docker for Desktop or podman) and or don't want to, you can bring up the container on a Linux VM and connect via a socat/ssh chain:
![alt text](img/SOCATConnection.png "SOCAT chain")

The [deploy script](../hack/local-monitor-testing/deploy_MDM_VM.sh) deploys such a VM on Azure (if you ./env things properly), configures it and installs the container.

The [start script](../hack/local-monitor-testing/startMDMNetwork.sh) can then be used to established the network connection as depicted in the diagram. For local VMs you may want to skip the ssh tunnel step.


### Starting the monitor

When starting the monitor , make sure to have your

- CLUSTER_MDM_ACCOUNT
- CLUSTER_MDM_NAMESPACE

environment variables set to Geneva account and namespace where you metrics is supposed to land in Geneva INT (https://jarvis-west-int.cloudapp.net/)

Use `go run -tags aro ./cmd/aro monitor` to start the monitor. You want to check what the current directory of your monitor is, because that's the folder the monitor will use to search for the mdm_statds.socket file, which needs to match where your mdm container or the socat command creates it.

A VS Code launch config that does the same would look like.

````
{
"name": "Launch Monitor",
"type": "go",
"request": "launch",
"mode": "auto",
"program": "./cmd/aro",
"buildFlags": "-tags aro",
"console": "integratedTerminal",
"args": ["-loglevel=debug",
"monitor",
],
"env": {"CLUSTER_MDM_ACCOUNT": "<PUT YOUR NAMESPACE HERE>",
"CLUSTER_MDM_NAMESPACE":"<PUT YOUR NAMESPACE HERE>" }
},
````
36 changes: 36 additions & 0 deletions hack/local-monitor-testing/configureRemote.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,36 @@
# Setup the VM
rpm --import https://dl.fedoraproject.org/pub/epel/RPM-GPG-KEY-EPEL-7
rpm --import https://packages.microsoft.com/keys/microsoft.asc

yum -y install https://dl.fedoraproject.org/pub/epel/epel-release-latest-7.noarch.rpm

cat >/etc/yum.repos.d/azure.repo <<'EOF'
[azure-cli]
name=azure-cli
baseurl=https://packages.microsoft.com/yumrepos/azure-cli
enabled=yes
gpgcheck=yes
EOF

yum --enablerepo=rhui-rhel-7-server-rhui-optional-rpms -y install \
azure-cli \
docker \
jq \
gcc \
rh-git29 \
rh-python36 \
tmpwatch \
lttng-usr \
gpgme-devel \
libassuan-devel \
socat


sed -i -e 's/^OPTIONS='\''/OPTIONS='\''-G cloud-user /' /etc/sysconfig/docker

systemctl enable docker
systemctl restart docker




92 changes: 92 additions & 0 deletions hack/local-monitor-testing/deploy_MDM_VM.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,92 @@
#!/bin/bash -e
set +x

BASE=$( git rev-parse --show-toplevel)

HOSTNAME=$( hostname )
NAME="mdm"
MDMIMAGE=linuxgeneva-microsoft.azurecr.io/genevamdm:master_20211120.1
MDMFRONTENDURL=https://int2.int.microsoftmetrics.com/
MDMSOURCEENVIRONMENT=$LOCATION
MDMSOURCEROLE=rp
MDMSOURCEROLEINSTANCE=$HOSTNAME

echo "Using:"

echo "Resourcegroup = $RESOURCEGROUP"
echo "User = $USER"
echo "HOSTNAME = $HOSTNAME"
echo "Containername = $NAME"
echo "Location = $LOCATION"
echo "MDM image = $MDMIMAGE"
echo " (version hardcoded. Check against pkg/util/version/const.go if things don't work)"
echo "Geneva API URL= $MDMFRONTENDURL"
echo "MDMSOURCEENV = $MDMSOURCEENVIRONMENT"
echo "MDMSOURCEROLE = $MDMSOURCEROLE"
echo "MDMSOURCEROLEINSTANCE = $MDMSOURCEROLEINSTANCE"

VMName="$USER-mdm-link"

CLOUDUSER="cloud-user"



if [ "$(az vm show -g $RESOURCEGROUP --name $VMName)" = "" ];
then
echo "Creating VM $VMName in RG $RESOURCEGROUP"
az vm create -g $RESOURCEGROUP -n $VMName --image RedHat:RHEL:7-LVM:latest --ssh-key-values @~/.ssh/id_rsa.pub --admin-username $CLOUDUSER
else
echo "VM already exists, skipping..."
fi


PUBLICIP=$( az vm list-ip-addresses --name $VMName -g $RESOURCEGROUP | jq -r '.[0].virtualMachine.network.publicIpAddresses[0].ipAddress' )

echo "Found IP $PUBLICIP"

scp $BASE/secrets/rp-metrics-int.pem $CLOUDUSER@$PUBLICIP:mdm.pem
scp $BASE/hack/local-monitor-testing-MACOS/2configureRemote.sh $CLOUDUSER@$PUBLICIP:

ssh $CLOUDUSER@$PUBLICIP "sudo cp mdm.pem /etc/mdm.pem"
ssh $CLOUDUSER@$PUBLICIP "sudo ./configureRemote.sh"


ssh $CLOUDUSER@$PUBLICIP "sudo docker pull $MDMIMAGE"

cat <<EOF > $BASE/dockerStartCommand.sh
docker run \
--entrypoint /usr/sbin/MetricsExtension \
--hostname $HOSTNAME \
--name $NAME \
-d \
--restart=always \
-m 2g \
-v /etc/mdm.pem:/etc/mdm.pem \
-v /var/etw:/var/etw:z \
$MDMIMAGE \
-CertFile /etc/mdm.pem \
-FrontEndUrl $MDMFRONTENDURL \
-Logger Console \
-LogLevel Warning \
-PrivateKeyFile /etc/mdm.pem \
-SourceEnvironment $MDMSOURCEENVIRONMENT \
-SourceRole $MDMSOURCEROLE \
-SourceRoleInstance $MDMSOURCEROLEINSTANCE
EOF


#disable SELINUX (don't shoot me)
ssh $CLOUDUSER@$PUBLICIP "sudo setenforce 0"
ssh $CLOUDUSER@$PUBLICIP "sudo getenforce"

#make it permanent
ssh $CLOUDUSER@$PUBLICIP "sudo sed -i 's/SELINUX=enforcing/SELINUX=permissive/g' /etc/selinux/config"


ssh $CLOUDUSER@$PUBLICIP "sudo firewall-cmd --zone=public --add-port=12345/tcp --permanent"
ssh $CLOUDUSER@$PUBLICIP "sudo firewall-cmd --reload"


scp $BASE/dockerStartCommand.sh $CLOUDUSER@$PUBLICIP:
ssh $CLOUDUSER@$PUBLICIP "chmod +x dockerStartCommand.sh"
ssh $CLOUDUSER@$PUBLICIP "sudo ./dockerStartCommand.sh &"
52 changes: 52 additions & 0 deletions hack/local-monitor-testing/sample/dockerStartCommand.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,52 @@


BASE=$( git rev-parse --show-toplevel)

SOCKETPATH="$BASE/cmd/aro"

HOSTNAME=$( hostname )
NAME="mdm"
MDMIMAGE=linuxgeneva-microsoft.azurecr.io/genevamdm:master_20211120.1
MDMFRONTENDURL=https://int2.int.microsoftmetrics.com/
MDMSOURCEENVIRONMENT=$LOCATION
MDMSOURCEROLE=rp
MDMSOURCEROLEINSTANCE=$HOSTNAME


echo "Using:"

echo "Resourcegroup = $RESOURCEGROUP"
echo "User = $USER"
echo "HOSTNAME = $HOSTNAME"
echo "Containername = $NAME"
echo "Location = $LOCATION"
echo "MDM image = $MDMIMAGE"
echo " (version hardcoded. Check against pkg/util/version/const.go if things don't work)"
echo "Geneva API URL= $MDMFRONTENDURL"
echo "MDMSOURCEENV = $MDMSOURCEENVIRONMENT"
echo "MDMSOURCEROLE = $MDMSOURCEROLE"
echo "MDMSOURCEROLEINSTANCE = $MDMSOURCEROLEINSTANCE"

cp $BASE/secrets/rp-metrics-int.pem /etc/mdm.pem




podman run \
--entrypoint /usr/sbin/MetricsExtension \
--hostname $HOSTNAME \
--name $NAME \
-d \
--restart=always \
-m 2g \
-v /etc/mdm.pem:/etc/mdm.pem \
-v $SOCKETPATH:/var/etw:z \
$MDMIMAGE \
-CertFile /etc/mdm.pem \
-FrontEndUrl $MDMFRONTENDURL \
-Logger Console \
-LogLevel Debug \
-PrivateKeyFile /etc/mdm.pem \
-SourceEnvironment $MDMSOURCEENVIRONMENT \
-SourceRole $MDMSOURCEROLE \
-SourceRoleInstance $MDMSOURCEROLEINSTANCE
32 changes: 32 additions & 0 deletions hack/local-monitor-testing/startMDMNetwork.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,32 @@
#!/bin/bash -e
set +x

BASE=$( git rev-parse --show-toplevel)
SOCKETFILE="$BASE/cmd/aro/mdm_statsd.socket"

echo "Using:"

echo "Resourcegroup = $RESOURCEGROUP"
echo "User = $USER"

VMName="$USER-mdm-link"
CLOUDUSER="cloud-user"


PUBLICIP=$( az vm list-ip-addresses --name $VMName -g $RESOURCEGROUP | jq -r '.[0].virtualMachine.network.publicIpAddresses[0].ipAddress' )

echo "Found IP $PUBLICIP, starting socat on the mdm-link vm"
ssh $CLOUDUSER@$PUBLICIP "sudo socat -v TCP-LISTEN:12345,fork UNIX-CONNECT:/var/etw/mdm_statsd.socket" &
sleep 3

echo "Starting SSH Tunnel"
ssh $CLOUDUSER@$PUBLICIP -N -L 12345:127.0.0.1:12345 &
sleep 3

if [ -f "$SOCKETFILE" ] ; then
rm "$SOCKETFILE"
fi
echo "Starting local socat link to the tunnel"
socat -v UNIX-LISTEN:$SOCKETFILE,fork TCP-CONNECT:127.0.0.1:12345 &


0 comments on commit 1c0a5b3

Please sign in to comment.