[System test runner] Add more service deployers #89

ycombinator · 2020-09-09T09:04:50Z

Follow up to #64.

Currently the system test runner only supports the Docker Compose service deployer. That is, it can only test packages whose services can be spun up using Docker Compose. We should add more service deployers to enable system testing of packages such as system (probably a no-op or minimal service deployer), aws (probably some way to pass connection parameters and credentials via environment variables and/or something that understands terraform files), kubernetes.

The text was updated successfully, but these errors were encountered:

mtojek · 2020-11-19T10:08:51Z

For reference:

Testing on Kubernetes written by @ChrsMark : https://github.com/elastic/integrations/blob/master/testing/environments/kubernetes/README.md
(probably outdated now as there were many changes introduced to Fleet)

mtojek · 2021-01-05T10:56:11Z

We need to cover following providers:

(correct me if I missed any of them)

Notes:

I decided to split Kubernetes and OpenShift as I'm not sure if their configuration paths can be unified.
(priority) let's start with providers we should already support - AWS, Azure
I decided to run Kubernetes on GCP not to overload local CPU/mem resources, but we may need to figure out the Docker image distribution model (if we want to use and deploy custom images).

Other use cases:

As a developer I'd like to boot up and access a long-running Kubernetes stack, so I can interactively collect metrics and design Kibana dashboards

Technical observations:

Any DSL seems to be the best fit here. The oldest candidate is Terraform, but it requires the terraform tool to be installed locally.
Pulumi uses programming languages, which is considered as no-go for elastic-package - we don't want people to learn a new language (doesn't matter if it's Go, Javascript or Python)
Research: https://github.com/thazelart/terraform-validator
Research: https://github.com/johandry/terranova https://github.com/johandry/terranova-examples
Research: https://hub.docker.com/r/hashicorp/terraform/

Questions:

Should the provider's stack be spawned similarly to the service's Docker-based stack, for the time of tests execution, or in a long-running manner like the Elastic stack?
Should we provide an option of acquiring authorization data for the internal team?

mtojek · 2021-01-05T12:25:29Z

@kaiyan-sheng @narph @ChrsMark

Would you mind describing here use cases for AWS, Azure and Kubernetes? I'm looking forward to seeing how these cloud/infra providers can be used for testing integrations.

ChrsMark · 2021-01-07T11:21:58Z

Thanks for the ping @mtojek, I will try to provide a scenario, with inline comments/thoughts that would cover our k8s needs.

Vanilla Kubernetes

Run elastic-package k8s up to bring up a k8s cluster, I don't think we should care where it is, on GKE or locally on minikube or kind. Maybe it will be better to have it running on GKE for now to avoid an extra step of minikube/kind installation (?). In this step all the required prerequisites should happen, like installing kube_state_metrics from which state_* metricsets will collect metrics from.
Run elastic-package test k8s (the syntax is abstract here for the sake of the example) so as to deploy agent on the running k8s cluster and enrol it to the Elastic Stack. Elastic stack should be running maybe on the same k8s cluster so as to have easier networking configuration to my mind (similar to the approach mentioned at testing on k8s). After the test is completed the cluster is still up and the agents are still shipping metrics. To clean this up we need to run the next command to bring the whole cluster down.
Run elastic-package k8s down which will destroy the cluster including the Elastic stack and Agents.

Note: I think this scenario will can be expanded to test other packages like istio and ingress-controller by adding them as extra flags in step 1.

OCP

This scenario step will be the same as for vanilla k8s, the only difference will be the installation step, where we need an Openshift installation. Here if we want a from scratch installation we need to run the GCP installer which takes ~40 mins to bring the cluster up. Not sure if this can be part of a CI job. Maybe can be a nightly job. Related to Add Openshift scenario for CI beats#17962. Ping me directly for more info ;).
Same as vanilla k8s but we will need slightly different manifest most probably cause of OCP restrictions.
Same as vanilla k8s, but use the GCP installer script to bring the cluster down.

Note 1: This is only for testing k8s module, but it should be quite similar for testing Autodiscover.
Note 2: The Running Agent on k8s thing is not yet completely decided. Progress/discussion happen around this at k8s-agent WP, cc: @blakerouse

kaiyan-sheng · 2021-01-07T18:20:25Z

For AWS testing, we can use a terraform script(or anything similar) per dataset/package to create AWS services for testing and cleanup after testing. I think we have an AWS account for testing in Beats jenkins (@jsoriano knows more about this) and we can leverage it here.

For metrics: an example can be we can run elastic-package test ec2-metrics locally to apply the terraform script to create an EC2 instance in AWS, wait for a while till EC2 metrics are sent into CloudWatch, check events collected from ec2-metrics package and delete the EC2 instance at the end.

For logs: We have sample files to test the pipelines already but it would be good to have terraform to setup S3-SQS to test the inputs.

There are two use cases here: one is to run this in CI and the other one is for package developers to test locally. Because creating services can be cost-inefficient, we should consider how frequently should we run elastic-package test ec2-metrics in CI?

mtojek · 2021-01-11T11:13:17Z

There are two use cases here: one is to run this in CI and the other one is for package developers to test locally. Because creating services can be cost-inefficient, we should consider how frequently should we run elastic-package test ec2-metrics in CI?

With this PR elastic/integrations#474 tests will be executed only if the relevant packages are changed (in this case AWS integration) or this is the master branch.

Regarding elastic-package test k8s and elastic-package test ec2-metrics I think we need to come up with an open, flexible API, so that we don't have to modify CLI every time we introduce new platform, but this is something we'll research :) I admit I haven't looked at the k8s as a separate stack, rather as a service under test that is alive for the duration of a test. Keeping it as a separate stack (like Elastic stack) might actually simplify things.

narph · 2021-01-11T13:49:41Z

for Azure we can look at something similar as the use case above. I previously worked on a POC using Pulumi which will authenticate the user, create a storage account , fetch metrics, validate on them and then remove the entire deployment.
I hope it is of interest here elastic/beats#21850.
Maybe something like elastic-package test azure storage could replace the entire process.
For azure logs, more steps are required, for example after creating the event hub we will have to populate it with some valid/invalid messages.
Not sure in how much detail we should go in this issue.

mtojek · 2021-01-14T12:55:23Z

I'm going with this issue.

mtojek · 2021-01-14T16:30:13Z

ChrsMark · 2021-01-14T17:32:08Z

Thanks for the heads-up @mtojek! Feel free to reach out to me if you guys have any questions about the k8s specifics since it can be tricky with different components we collect from unlike other clouds where we define a single exposed endpoint.

kaiyan-sheng · 2021-01-14T19:14:04Z

With this PR elastic/integrations#474 tests will be executed only if the relevant packages are changed (in this case AWS integration) or this is the master branch.

Great, thank you!

ycombinator · 2021-01-14T20:00:50Z

Thanks for the write up and breakdown of tasks, @mtojek. Very helpful!

Dev changes in package-spec:

Allow for data-stream level _dev/deploy definitions
or

Mount extra files for data-stream in runtime (it may prevent from building the image multiple times)

@ycombinator, I still have doubts which path should we follow. If you have any preferences or see benefits of any of them, please feel free to share.

I recall discussing the first option (Allow for data-stream level _dev/deploy definitions) in our meeting today but not the second one (Mount extra files for data-stream in runtime (it may prevent from building the image multiple times)). Would you mind explaining some details about the second option? Thanks.

mtojek · 2021-01-14T20:44:14Z

(I came up to this point based on observing the Zeek integration)

I can elaborate on this. Imagine we have an integration XYZ with data streams A, B, C, ... Z. Every data stream is basically the same Docker image with terraform executor and own set of static tf templates. The improvement is to use a single Docker image and simply mount (switch) templates for the data stream test scenario. This way it will be faster than building new Docker image for a data stream.

ycombinator · 2021-01-14T20:50:52Z

I always assumed (but probably didn't make it explicit, sorry!) that there would be one shared/common TF executor Docker image that is used by the TF service deployer. The definition and maintenance of this image is the responsibility of elastic-package developers, as opposed to that of package developers.

The part that varies is the TF templates, whether those come from the package-level ({package}/_dev/deploy/tf/...) or the data stream-level ({package}/data_stream/{data stream}/_dev/deploy/tf/...). The definition and maintenance of this image is the responsibility of package developers.

So I think we're on the same page?

mtojek · 2021-01-14T21:21:58Z

The part that varies is the TF templates, whether those come from the package-level ({package}/_dev/deploy/tf/...) or the data stream-level ({package}/data_stream/{data stream}/_dev/deploy/tf/...). The definition and maintenance of this image is the responsibility of package developers.

I agree with the rest of your comment. Regarding the quoted paragraph - what is the best of processing these TF templates (belonging to particular data-streams)? Load them in the runtime? Include them in the build time (one image build per data stream)?

(I think we're on the same page, just confirming the implementation details :)

ycombinator · 2021-01-14T21:38:00Z

Load them in the runtime? Include them in the build time (one image build per data stream)?

There is also a third option: include all of them at image build time (so you are not building one image per data stream) and then select the right data stream's templates at runtime.

At any rate, I don't know if there's an obvious answer to this one. I would suggest trying one of the options, probably the one you think is simplest to implement, see how well it performs and then iterate from there as necessary.

jsoriano · 2021-01-15T11:36:09Z

+1 to implement this as a generic declarative Terraform-based runner 👍

Some comments in case they are helpful:

In Run kubernetes integration tests inside of a pod and use kind to setup a kubernetes cluster beats#17656 Blake extended mage goIntegTest for Metricbeat to be able to run tests in Kubernetes (with kind) apart of the usual docker compose. There it was also done in a generic way, one provider or the other were used depending on the available files. A similar approach could be followed here to continue supporting docker-compose, or if we want to support other providers in the future.
A use case I think that can be powerful for some complex scenarios is the use of Kubernetes operators to provide testing scenarios. Operators make complex deployments easier. I have used https://strimzi.io/ in the past to deploy Kafka clusters with different authentication methods (something quite painful and error-prone to do from scratch), and https://kubecf.io/ to deploy Cloud Foundry clusters (best method I have found so far to reliably deploy CF in an automated way).
The usual approach for them is to create a namespace, install them with helm there, and use them with custom resources. In principle all of this would be supported by Terraform (namespace, helm, and custom resources), but I would like to see it validated when this is implemented 🙂
Common code can be tricky with Terraform. This is more an implementation detail, but something to take into account because this is going to give problems at some point. You will find cases where sharing some Terraform code between scenarios will be useful (or even needed). For example when defining the providers, depending if the tests are run locally or in CI, the authentication methods can be different (e.g. to use a private docker registry in CI), and can be shared between packages (e.g. multiple packages using aws, or kubernetes providers), maybe a way to solve this is to don't include the providers definition in scenarios and generate them when running the tests depending on some config. Another example is shared code itself, it may be tempting to rely on some resource to be always present in some cloud provider, but this can be unreliable and can difficult supporting packages living in their own repositories. An example for this is the usual network configurations that are needed when deploying instances in a cloud provider, maybe elastic-package should always provide some base resources when some specific providers are used, so scenarios can be simpler. Same thing with kubernetes, an scenario could define some kubernetes resources, but elastic-package would provide the cluster and the credentials.

mtojek · 2021-01-15T12:01:19Z

Thank you for sharing your mind, lot's of tricky ideas ;) I like the idea of kops.

In elastic/beats#17656 Blake extended mage goIntegTest for Metricbeat to be able to run tests in Kubernetes (with kind) apart of the usual docker compose. There it was also done in a generic way, one provider or the other were used depending on the available files. A similar approach could be followed here to continue supporting docker-compose, or if we want to support other providers in the future.

Honestly I think we're not there yet. First, the Elastic-Agent needs to support autodiscovery and Kubernetes runtime. Then we can think about potential integrations.
Keep in mind that we'd like to examine integrations not the entire the end-to-end flow. I would leave the verification of the Elastic-Agent functionality in different runtime to the Agent or e2e-tests.

ChrsMark · 2021-01-29T15:22:38Z

@mtojek @ycombinator fyi for k8s package testing I'm using some mock APIs so as to proceed until we reach to a more permanent solution. You can find more at elastic/integrations#569.

While working with these mocks I realise more the need for running against an actual k8s cluster and more specifically having Agent deployed on the cluster natively. Without this, many things like k8s tokens crts etc we need will not be valid.

ycombinator · 2021-01-29T15:46:07Z

While working with these mocks I realise more the need for running against an actual k8s cluster and more specifically having Agent deployed on the cluster natively. Without this, many things like k8s tokens crts etc we need will not be valid.

This is super valuable information. @mtojek and I have informally discussed the idea that for some service deployers it might make sense to deploy the agent "along side" the service — your findings seem to be along these lines so this is very valuable feedback. Thank you!

mtojek · 2021-02-03T09:43:42Z

@kaiyan-sheng AWS integration can be tested now using the Terraform executor (sample here: https://github.com/elastic/integrations/tree/master/packages/aws/data_stream/ec2_metrics).

@narph this feature is written in a generic way. If you pass secrets for Azure and write some TF code, it's expected to work.

EDIT:

we just need to enable secrets on the Jenkins side, but shouldn't be a big issue (unless we don't have them generated at all).

mtojek · 2021-02-18T07:58:46Z

Let me summarize it -

We've delivered (and applied in Integrations):

generic Terraform service deployer, that currently supports AWS and possibly other providers like Azure, GCP, etc., using environment variables to pass credentials
Kubernetes service deployer which uses kind and potentially additional resources (e.g. custom application deployment).

ChrsMark mentioned this issue Jan 13, 2021

Migrate Kubernetes integration elastic/integrations#511

Closed

8 tasks

mtojek self-assigned this Jan 14, 2021

This was referenced Jan 18, 2021

Refresh AWS integration for development #225

Merged

Support Terraform deployment files elastic/package-spec#111

Merged

Enable _dev/deploy for data streams #228

Merged

mtojek mentioned this issue Jan 27, 2021

Implement terraform based system tests #227

Merged

This was referenced Feb 1, 2021

Support Kubernetes service deployer #239

Closed

Define sample tests for AWS EC2 module elastic/integrations#599

Merged

Terraform service deployer: support user-defined environment variables elastic/package-spec#123

Merged

mtojek closed this as completed Feb 18, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[System test runner] Add more service deployers #89

[System test runner] Add more service deployers #89

ycombinator commented Sep 9, 2020

mtojek commented Nov 19, 2020 •

edited

Loading

mtojek commented Jan 5, 2021 •

edited

Loading

mtojek commented Jan 5, 2021

ChrsMark commented Jan 7, 2021 •

edited

Loading

kaiyan-sheng commented Jan 7, 2021

mtojek commented Jan 11, 2021

narph commented Jan 11, 2021

mtojek commented Jan 14, 2021

mtojek commented Jan 14, 2021 •

edited

Loading

ChrsMark commented Jan 14, 2021

kaiyan-sheng commented Jan 14, 2021

ycombinator commented Jan 14, 2021

mtojek commented Jan 14, 2021

ycombinator commented Jan 14, 2021 •

edited

Loading

mtojek commented Jan 14, 2021

ycombinator commented Jan 14, 2021 •

edited

Loading

jsoriano commented Jan 15, 2021 •

edited

Loading

mtojek commented Jan 15, 2021

ChrsMark commented Jan 29, 2021

ycombinator commented Jan 29, 2021

mtojek commented Feb 3, 2021 •

edited

Loading

mtojek commented Feb 18, 2021

[System test runner] Add more service deployers #89

[System test runner] Add more service deployers #89

Comments

ycombinator commented Sep 9, 2020

mtojek commented Nov 19, 2020 • edited Loading

mtojek commented Jan 5, 2021 • edited Loading

mtojek commented Jan 5, 2021

ChrsMark commented Jan 7, 2021 • edited Loading

Vanilla Kubernetes

OCP

kaiyan-sheng commented Jan 7, 2021

mtojek commented Jan 11, 2021

narph commented Jan 11, 2021

mtojek commented Jan 14, 2021

mtojek commented Jan 14, 2021 • edited Loading

ChrsMark commented Jan 14, 2021

kaiyan-sheng commented Jan 14, 2021

ycombinator commented Jan 14, 2021

mtojek commented Jan 14, 2021

ycombinator commented Jan 14, 2021 • edited Loading

mtojek commented Jan 14, 2021

ycombinator commented Jan 14, 2021 • edited Loading

jsoriano commented Jan 15, 2021 • edited Loading

mtojek commented Jan 15, 2021

ChrsMark commented Jan 29, 2021

ycombinator commented Jan 29, 2021

mtojek commented Feb 3, 2021 • edited Loading

mtojek commented Feb 18, 2021

mtojek commented Nov 19, 2020 •

edited

Loading

mtojek commented Jan 5, 2021 •

edited

Loading

ChrsMark commented Jan 7, 2021 •

edited

Loading

mtojek commented Jan 14, 2021 •

edited

Loading

ycombinator commented Jan 14, 2021 •

edited

Loading

ycombinator commented Jan 14, 2021 •

edited

Loading

jsoriano commented Jan 15, 2021 •

edited

Loading

mtojek commented Feb 3, 2021 •

edited

Loading