Skip to content

Latest commit

 

History

History
331 lines (259 loc) · 12.7 KB

0066-dogfooding-tekton.md

File metadata and controls

331 lines (259 loc) · 12.7 KB
status title creation-date last-updated authors
proposed
Dogfooding Tekton
2021-05-16
2021-05-16
afrittoli

TEP-0066: Dogfooding Tekton

Summary

The Tekton community has been "dogfooding" Tekton since the early stages of the project.

The term "dogfooding" is an IT slang for the use of one's own products. In some uses, it implies that developers or companies are using their own products to work out bugs, as in beta testing. One benefit of dogfooding is that it shows that a company is confident about its products.

We would like to capture the goals of the dogfooding effort in the Tekton community, as document guiding principles, existing achievements as well as future roadmap.

Motivation

The dogfooding effort in Tekton has been going on since the early stages of the project, so why writing a TEP now?

When the dogfooding work was started we did not have a TEP process in place, we relied on shared Google documents instead. This TEP captures the work done this far, its guiding principles, the future roadmap; it aims to raise awareness about this work in the community, and to make it easier for new developers to get started and contribute to it.

Goals

Tekton is non-opinionated by design. It allows designing CI/CD pipelines and systems, but it's not prescriptive about how to do that. The dogfooding effort offers CI/CD services to the Tekton community through Tekton, and in doing so it must take design decisions on how to build such services through Tekton. Goals of this TEP are to:

  • Formalize the motivation for the dogfooding effort
  • Describe the design principles for CI/CD services based on Tekton
  • Identify the services implemented through Tekton
  • Identify areas were more work is needed and define a future roadmap

Non-Goals

This TEP does not aim to design specific services based on Tekton, that work will be tracked in dedicated TEPs where needed.

Use Cases (optional)

Use cases that the dogfooding effort implements are:

  • Experience Tekton from the operator, author and user point of view, which allows the Tekton community to:
    • Identify missing functional features and usability issues
    • Identify operational problems and security concerns
    • Discover bugs that can be better identified in a running system, such as upgrade issues, a problems related to maintainability and scale
    • Help validating nightly and full releases, discover regressions and integration issues
  • Offer CI/CD services to the Tekton community with minimal dependencies to other products, and thus greater control on the outcome
  • Build a repository of examples that embody best practices for Tekton use to tap in

Requirements

  • Provide reliable CI/CD services to the Tekton community
  • Dogfood as many project projects as possible, from the most stable and mature to the experimental ones
  • Provide a mean to experiment with alpha and experimental features and build a way forward to stable
  • Provide documentation about how to operate dogfooding services, as well as how to contribute to their development
  • Keep a low barrier to entry to experimentation and at the same time safeguard production type of dogfooding services

Proposal

Notes/Caveats (optional)

Risks and Mitigations

We want to offer continuous integration and continuous delivery services to the Tekton community, which are essential to the development of Tekton itself. This goes in contrast with the need of dogfooding alpha stability and even experimental services. Mitigations for this are:

  • Use a multi-cluster setup:
    • One cluster runs nightly releases and experimental code. This cluster may offer services, but no essential ones, so that Tekton development may continue normally if this cluster is broken.
    • One cluster runs major releases of Tekton components (when available). Until we are able to provide automated service verification and rollback capabilities, deployments to this cluster are vetted by humans. Write / admin access to this cluster is reduced to a limited number of individuals. This cluster may offer essential services.
  • The build-captain role: there is always at least on person per day on duty (at least during working hours in one TZ) to verify the status of CI/CD services and respond to incidents.
  • Introduce Tekton based services incrementally. For experimental components, if an alternative is available, use the experimental component only in a few places until the component has proven stable enough.

Using experimental Tekton components in production may raise security concerns. Mitigations are somewhat similar to those already discussed:

  • Use experimental components for non essential services. This allows stopping a service completely in case a critical security issue is discovered. When a non-experimental alternative is not available, we may use experimental components for essential services too but we should document the impact of taking down the service and possibly provide a mitigation plan until service can be restored.
  • Fully automate the setup of the clusters, and document any exception. In case secrets are exposed, this allows to easily re-create the clusters with new secrets and restore secure services.

These risks have the positive side effect of increasing the awareness of critical issue in the community as well as the motivation for a prompt resolution. Now that Tekton is more widely adopted, we see a similar effect with Tekton users who stay on top of Tekton latest releases, as we get issues reported right after a release, if a feature that is not well exercised through testing and dogfooding is broken by a release.

User Experience (optional)

Performance (optional)

Design Details

Test Plan

Design Evaluation

Drawbacks

Alternatives

Infrastructure Needed (optional)

Upgrade & Migration Strategy (optional)

References (optional)