Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

IPSec System Test troubleshooting guide #89

Merged
merged 1 commit into from
Jul 10, 2024
Merged
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
227 changes: 227 additions & 0 deletions tests/system-tests/ipsec/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,227 @@
# Telco Verification IPSec System Test Troubleshooting

## Overview
This document is intended to be a brief troubleshooting guide if there are problems
with the Telco Verification IPSec System Tests. Normally, if these tests fail, it
will be because the IPSec tunnel is not correctly established.

IPSec tunnels will have 2 endpoints, and in the context of this testing, one endpoint
will be the SNO cluster, and the other end is usually a Security Gateway. For these
test cases, the Security Gateway is usually a simple RHEL VM running in the lab.

## Telco Verification IPSec System Test cases
Currently, for the SNO use case, these are the test cases:

* ipsec_tunnel_deployment.go - verify the tunnel is established after deployment
* ipsec_packets_snoegress.go - verify packets egress the cluster via the IPSec tunnel
* ipsec_packets_snoingress.go - verify packets ingress the cluster via the IPSec tunnel

### Troubleshooting ipsec_tunnel_deployment test failure
This will most likely be the most common failure and will most likely fail because
the tunnel is not established.

If the tunnel is not established, it will most likely be for one of the following
reasons:

1. The Security Gateway is down, either the server or the tunnel on the server
2. There is no connectivity between the SNO cluster and the Security Gateway
3. The IP addresses have changed either on the SNO cluster or the Security Gateway

Find details on how to check these in the following sections.

#### Checking the Security Gateway

The remote end of the SNO IPSec tunnel is often-times called a Security Gateway.
Currently the Security Gateway is running on a RHEL VM which can be accessed using
the IP address stored statically in the `secgw_host_ip` parameter in the
`eco-gotests/tests/system-tests/ipsec/internal/ipsecconfig/default.yaml` file.
Set the `ECO_IPSEC_SECGW_HOST_IP` environment variable to override this IP
address in CI.

If it is not possible to connect to the Security Gateway, then it may be down or
have other issues that need to be fixed. The SNO IPSec tunnel cannot work if the
Security Gateway is not running correctly.

Once connected to the Security Gateway, verify the tunnel is up as explained below
in Appendix A1.

Additionally, check the IP Addresses are correct in the Security Gateway IPSec
configuration file:

```
sudo more /etc/ipsec.d/secgw.conf
conn secgw
left=10.1.28.190
leftid=%fromcert
leftrsasigkey=%cert
leftsubnet=172.16.123.0/24
leftcert=north
rightrsasigkey=%cert
right=10.1.101.129
rightid=%fromcert
authby=rsasig
auto=add
ikelifetime=24h
salifetime=1h
ikev2=insist
phase2=esp
fragmentation=yes
esp=aes256-sha1
ike=aes256-sha1
```

The `left` and `leftsubnet` configuration parameters refer to the local configuration,
which in this case is the Security Gateway configuration. The IP in `left` should
be the IP used to ssh to the Security Gateway, and `leftsubnet` should be the subnet
on the Security Gateway `ipsec` interface. The `right` configuration parameter should
be the IP address of the `br-ex` interface on the SNO cluster.

#### Checking connectivity between the Security Gateway and the SNO cluster

It must be possible to ping from the Security Gateway to the SNO cluster using the
SNO IP address configured in the `right` parameter in the Security Gateway IPSec
config file. If not, then it will be impossible to start the IPSec tunnel.

If there is a firewall running on the Security Gateway, it needs to be configured
to open ports for IPSec, as explained in Appendix A4.

#### Checking the SNO cluster IPSec configuration

If the SNO cluster used for IPSec testing is changed, then most likely the IP addresses
will have to be changed.

To change the SNO cluster IP, first update the `iperf3_server_sno_ip` parameter
in the `eco-gotests/tests/system-tests/ipsec/internal/ipsecconfig/default.yaml`
file. This parameter can be overridden in CI by setting the `ECO_IPSEC_IPERF3_SERVER_SNO_IP`
environment variable.

Additionally, the SNO cluster Machine Config must be created again and stored in
the SNO cluster ZTP git repo, as explained in Appendix A2.

### Troubleshooting ipsec_packets_snoegress test failure

This test case consists of first running an ssh command on the Security Gateway
to run the `iperf3` command in server mode, waiting for an iperf3 client connection
from the SNO cluster. The test case then creates a workload on the SNO cluster which
runs the `iperf3` application in client mode.

The test case could fail for one of the following reasons:

1. The `iperf3` binary is not available on the Security Gateway
2. A firewall on the Security Gateway is blocking the `iperf3` ports. Refer to Appendix A4.
3. The iperf3 image for the SNO cluster is not available. Refer to Appendix A3.
4. One of the IPs in `eco-gotests/tests/system-tests/ipsec/internal/ipsecconfig/default.yaml`
is incorrect.

### Troubleshooting ipsec_packets_snoingress test failure

This test case consists of first creating a workload on the SNO cluster which
runs the `iperf3` command in server mode, waiting for an iperf3 client connection
from the Security Gateway. The test case then runs an ssh command on the Security
Gateway to run the `iperf3` application in client mode.

The test case could fail for one of the following reasons:

1. The `iperf3` binary is not available on the Security Gateway
2. A firewall on the Security Gateway is blocking the `iperf3` ports. Refer to Appendix A4.
3. The iperf3 image for the SNO cluster is not available. Refer to Appendix A3.
4. One of the IPs in `eco-gotests/tests/system-tests/ipsec/internal/ipsecconfig/default.yaml`
is incorrect.

## Appendix

### A1 Checking the IPSec tunnel

The following 3 commands show different information about the IPSec tunnel:

```
sudo ipsec briefstatus
sudo ipsec showstates
sudo ipsec traffic
```

The following command shows much more information, and usually will not be necessary
unless more involved debugging is needed:

```
sudo ipsec briefstatus
```

It must be possible to ping the Security Gateway VPN IP address from the SNO cluster.
Notice this ping must be performed with the correct source IP, as follows:

```
# On the Security Gateway, get the VPN IP Address:
ip a show ipsec
5: ipsec: <NO-CARRIER,POINTOPOINT,MULTICAST,NOARP,UP> mtu 1500 qdisc fq_codel state DOWN group default qlen 500
link/none
inet 172.16.123.10/24 brd 172.16.123.255 scope global noprefixroute ipsec
valid_lft forever preferred_lft forever
```

```
# On the SNO cluster, get the source IP address:
ip a show br-ex noprefixroute
14: br-ex: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UNKNOWN group default qlen 1000
link/ether 6c:fe:54:58:ee:e4 brd ff:ff:ff:ff:ff:ff
inet 10.1.101.129/27 brd 10.1.101.159 scope global noprefixroute br-ex
valid_lft forever preferred_lft forever

# On the SNO cluster Ping the Security Gateway, via the IPSec tunnel
ping -I 10.1.101.129 172.16.123.10
PING 172.16.123.10 (172.16.123.10) from 10.1.101.129 : 56(84) bytes of data.
64 bytes from 172.16.123.10: icmp_seq=1 ttl=64 time=0.635 ms
64 bytes from 172.16.123.10: icmp_seq=2 ttl=64 time=0.381 ms
64 bytes from 172.16.123.10: icmp_seq=3 ttl=64 time=0.277 ms
```

### A2 Creating the OCP IPSec Machine Config

Complete instructions can be found here:

```
cnf-features-deploy/ztp/source-crs/optional-extra-manifest/README.md
```

It is best to use butane version `0.20.0`.

The most important parts of this is to place the latest certificates in this
directory, and to correctly configure the IP Addresses in the `ipsec-endpoint-config.yml`
file.

Once the Machine Configurations have been generated, store the updated `99-ipsec-master-endpoint-config.yaml`
file in the SNO cluster ZTP git repo.

### A3 The iperf3 ubi image

Since the `iperf3` application is not available on RHCOS, an image was created so
the application can be used for these test cases. This image was created with the
following Dockerfile: `eco-gotests/images/system-tests/ipsec-iperf3/Dockerfile`.

The image url can be found in the `iperf3tool_image` parameter in the
`eco-gotests/tests/system-tests/ipsec/internal/ipsecconfig/default.yaml` file.
This parameter can be overriden with the `ECO_IPSEC_TESTS_IPERF3_IMAGE`
environment variable.

### A4 The Security Gateway firewall

If the `firewall-cmd` command is not found, or there is an error saying that
`firewalld` is not running, then this part can be skipped.

To check the `firewalld` configuration, execute the following command:

```
sudo firewall-cmd --list-all
```

The following command will open ports for ipsec and iperf3:

```
firewall-cmd --add-service="ipsec"
firewall-cmd --add-port=30000/tcp
firewall-cmd --runtime-to-permanent
```

Notice that the port used for iperf3 is configured in the `node_port` parameter
in the `eco-gotests/tests/system-tests/ipsec/internal/ipsecconfig/default.yaml` file.
This parameter can be overridden with the `ECO_IPSEC_NODE_PORT` environment variable.
Loading