Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Entrypoint: Support for multiple pods and multiple stages #1824

Merged
merged 4 commits into from
Nov 10, 2020
Merged

Entrypoint: Support for multiple pods and multiple stages #1824

merged 4 commits into from
Nov 10, 2020

Conversation

darkmuggle
Copy link
Contributor

@darkmuggle darkmuggle commented Nov 2, 2020

In a previous PR, an unrelated question triggered a line of thinking
that illustrated that entrypoint was not being creative enough. This
is the result.

Previously to this change, COSA expected the orchestrator to know enough
about COSA to both setup and run COSA. When running COSA in various
environments this a high bar. This change moves the knowledge of "how
to run COSA" into COSA itself, namely:

  • Extends the JobSpec to support "stages". Each stage can run arbitrary
    commands such as "cosa build." Stages are run sequentially, but the
    commands can be run concurrently.
  • Full live-cycling of worker pods. Worker pods are created, monitored
    and deleted.
  • Defines, but is not implemented, the ability to create non-blocking
    pods for stages.
  • Enables serving of files from the buildconfig pod to the worker pods.
    Worker pods can fetch sources and configs from the buildconfig pod.
  • Full streaming of logs to the buildconfig and saving of logs.
  • Worker pods return the results to starting log via Minio.
  • Stages can require an artifact, which requires something (i.e. a
    worker) to place it on the origin pod.
  • Stages support prep and post commands.

Worker pods are tightly coupled to the buildconfig pod, by design. A
worker pod is created from the same pod definition of the buildconfig.
entry builder determines whether its running in buildconfig or worker
mode by checking for an OpenShift buildconfig spec or a COSA entrypoint
workSpec. When a worker comes up:

  • initializes /srv
  • auto-discovers secrets
  • runs the stage(s) assigned to it.
  • returns the results over Minio

The JobSpec now supports short-hand artifact creation. For example:

    job:
      build_name: me
      strict: true
    recipe:
      git_ref: testing-devel
      git_url: https://github.com/coreos/fedora-coreos-config
    stages:
     - id: base
       description: "building"
       build_artifacts:
       - base
       - metal
       - metal4k
     - id: live-iso
       own_pod: true
       description: "live ISO"
       require_artifacts:
       - base
       - metal
       - metal4k
       build_artifacts:
       - installer

@darkmuggle
Copy link
Contributor Author

darkmuggle commented Nov 2, 2020

Example of stages:

stages:
- ID: Awesome
  Commands: |
      cosa fetch
      cosa build --delay-meta-merge
- ID: Metal
  Commands:
      - cosa buildextend-metal
      - cosa buildextend-metal4k
  Concurrent: true
- ID: Clouds:
  Commands:
       - cosa buildextend-aws
       - cosa buildextend-azure
       - cosa buildextend-gcp
       - cosa buildextend-digitalocean
   Concurrent: true

@darkmuggle darkmuggle added the WIP PR still being worked on label Nov 2, 2020
@darkmuggle
Copy link
Contributor Author

Calling this a WIP. There's a bit to do here, but putting this up early.

Copy link
Member

@cgwalters cgwalters left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The commit message is good...but there's so much going on here I have trouble understanding how it all works; basically need to try to see some porting of existing cosa usage.

One high level thought - maybe it would be clearer if this was a separate quay.io/coreos/coreos-assembler-openshift container from a separate git repo or something?

Is there a particular section of this you think needs more review?

@darkmuggle
Copy link
Contributor Author

One high level thought - maybe it would be clearer if this was a separate quay.io/coreos/coreos-assembler-openshift container from a separate git repo or something?

💯 that why there is an ocp/Dockerfile for that purpose

@darkmuggle
Copy link
Contributor Author

Lifting the WIP.

@cgwalters there's nothing specific I need reviewed other than the "make sure I'm not doing something stupid" sniff check.

This is the minimum viable work needed to start draining the pipelines. This should be the last massive drop of code and this works on OpenShift 3 and OpenShift 4. I anticipate the usual amount of follow-on PR's and I've tried to be really verbose in comments.

@darkmuggle darkmuggle removed the WIP PR still being worked on label Nov 8, 2020
Ben Howard added 4 commits November 8, 2020 10:52
- Added GoDeps
- Added schema generator.
- Fix CI by using ../tools/bin in the path
To aide in the coordination of artifacts between pods, entrypoint now
understands how to find a build. Entrypoint also will use the schema to
determine what artifacts its capable of building.
In a previous PR, an unrelated question triggered a line of thinking
that illustrated that `entrypoint` was not being creative enough. This
is the result.

Previously to this change, COSA expected the orchestrator to know enough
about COSA to both setup and run COSA. When running COSA in various
environments this a high bar. This change moves the knowledge of "how
to run COSA" into COSA itself, namely:
- Extends the JobSpec to support "stages". Each stage can run arbitrary
  commands such as "cosa build." Stages are run sequentially, but the
  commands can be run concurrently.
- Full live-cycling of worker pods. Worker pods are created, monitored
  and deleted.
- Defines, but is not implemented, the ability to create non-blocking
  pods for stages.
- Enables serving of files from the buildconfig pod to the worker pods.
  Worker pods can fetch sources and configs from the buildconfig pod.
- Full streaming of logs to the buildconfig and saving of logs.
- Worker pods return the results to starting log via Minio.
- Stages can require an artifact, which requires something (i.e. a
  worker) to place it on the origin pod.
- Stages support prep and post commands.

Worker pods are tightly coupled to the buildconfig pod, by design. A
worker pod is created from the same pod definition of the buildconfig.
`entry builder` determines whether its running in buildconfig or worker
mode by checking for an OpenShift buildconfig spec or a COSA entrypoint
`workSpec`. When a worker comes up:
- initializes /srv
- auto-discovers secrets
- runs the stage(s) assigned to it.
- returns the results over Minio

The JobSpec now supports short-hand artifact creation. For example:
    job:
      build_name: me
      strict: true
    recipe:
      git_ref: testing-devel
      git_url: https://github.com/coreos/fedora-coreos-config
    stages:
     - id: base
       description: "building"
       build_artifacts:
       - base
       - metal
       - metal4k
     - id: live-iso
       own_pod: true
       description: "live ISO"
       require_artifacts:
       - base
       - metal
       - metal4k
       build_artifacts:
       - installer

.PHONY: clean
clean:
@go clean .
@rm -rf bin

.PHYON: schema
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

.PHONY:

}
}

if buildID == "" {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can this code be moved to the block above?

@mike-nguyen
Copy link
Member

mike-nguyen commented Nov 10, 2020

There is a lot going on here but overall looks sane to me. There are some minor things that can be handled in a followup PR. I will leave it up for a little longer to give more time for others to review.

Copy link
Contributor

@bh7cw bh7cw left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks sane to me. Maybe a follow-up PR to fix the typo. Merge it now. So, we can test the cosa remote.
/lgtm

@openshift-ci-robot
Copy link

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: bh7cw, cgwalters, darkmuggle

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:
  • OWNERS [cgwalters,darkmuggle]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@darkmuggle
Copy link
Contributor Author

darkmuggle commented Nov 10, 2020

/override ci/prow/sanity
/override tide

Tests past, PROW showed success, but the status was not showing up here.

Status show everythings okay....forcing the merge.

@openshift-ci-robot
Copy link

@darkmuggle: Overrode contexts on behalf of darkmuggle: ci/prow/sanity, tide

In response to this:

/override ci/prow/sanity
/override tide

Status show everythings okay....forcing the merge.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@darkmuggle darkmuggle merged commit f4f9a68 into coreos:master Nov 10, 2020
@darkmuggle darkmuggle deleted the feature/yolo-pods branch November 10, 2020 17:07
@cgwalters
Copy link
Member

/override tide

FWIW this shouldn't be necessary - tide is the thing that merges once the other PR contexts go green.

/override ci/prow/sanity

Usually I'd go with at least a /retest first.

@dustymabe
Copy link
Member

Looks like the COSA builds in Quay started failing right about the time this was merged:

Screenshot_2020-11-10 coreos-assembler coreos-assembler · Quay

@dustymabe
Copy link
Member

./build.sh make_and_makeinstall
11/10/2020, 5:11:12 PM
11/10/2020, 5:11:12 PM ---> Running in b5adc243cfb4
11/10/2020, 5:11:12 PM ++ pwd + srcdir=/root/containerbuild + '[' 1 -ne 0 ']' + make_and_makeinstall + make
11/10/2020, 5:11:12 PM cd tools && make
11/10/2020, 5:11:12 PM make[1]: Entering directory '/root/containerbuild/tools' mkdir -p bin
11/10/2020, 5:11:12 PM test -e bin/minio || \ go build -o bin/minio ./vendor/github.com/minio/minio
11/10/2020, 5:12:45 PM test -e bin/golangci-lint || \ go build -o bin/golangci-lint ./vendor/github.com/golangci/golangci-lint/cmd/golangci-lint
11/10/2020, 5:13:19 PM test -e bin/gosec || \ go build -o bin/gosec ./vendor/github.com/securego/gosec/cmd/gosec
11/10/2020, 5:13:21 PM test -f bin/schematyper || \ go build -o bin/schematyper ./vendor/github.com/idubinskiy/schematyper
11/10/2020, 5:13:23 PM make[1]: Leaving directory '/root/containerbuild/tools'
11/10/2020, 5:13:23 PM cd mantle && make
11/10/2020, 5:13:23 PM make[1]: Entering directory '/root/containerbuild/mantle'
11/10/2020, 5:13:23 PM ./build cmd/*
11/10/2020, 5:13:23 PM Building kola
11/10/2020, 5:15:02 PM Building kolet
11/10/2020, 5:15:10 PM Building ore
11/10/2020, 5:15:43 PM Building plume
11/10/2020, 5:15:48 PM make[1]: Leaving directory '/root/containerbuild/mantle'
11/10/2020, 5:15:48 PM cd entrypoint && make
11/10/2020, 5:15:48 PM make[1]: Entering directory '/root/containerbuild/entrypoint'
11/10/2020, 5:15:49 PM gofmt -d -e -l ./cmd/build.go ./cmd/entry.go ./cosa/build.go ./cosa/builds.go ./cosa/builds_test.go ./cosa/schema.go ./cosa/schema_doc.go ./cosa/schema_test.go ./cosa/v1.go ./ocp/bc.go ./ocp/bc_ci_test.go ./ocp/bc_test.go ./ocp/client.go ./ocp/const.go ./ocp/cosa-pod-s390x.go ./ocp/cosa-pod.go ./ocp/cosa_init.go ./ocp/errors.go ./ocp/filer.go ./ocp/filer_test.go ./ocp/k8s.go ./ocp/ocp.go ./ocp/remotes.go ./ocp/remotes_test.go ./ocp/return.go ./ocp/sa_secrets.go ./ocp/source_extract.go ./ocp/worker.go ./spec/clone.go ./spec/jobspec.go ./spec/jobspec_test.go ./spec/stage_test.go ./spec/stages.go ./spec/tmpl.go
11/10/2020, 5:15:49 PM golangci-lint run -v ./...
11/10/2020, 5:15:49 PM level=info msg="[config_reader] Config search paths: [./ /root/containerbuild/entrypoint /root/containerbuild /root /]"
11/10/2020, 5:15:49 PM level=info msg="[lintersdb] Active 10 linters: [deadcode errcheck gosimple govet ineffassign staticcheck structcheck typecheck unused varcheck]"
11/10/2020, 5:16:49 PM level=info msg="Memory: 602 samples, avg is 69.6MB, max is 69.6MB"
11/10/2020, 5:16:49 PM level=info msg="Execution took 1m0.05046157s"
11/10/2020, 5:16:49 PM level=info msg="[loader] Go packages loading at mode 575 (deps|files|imports|compiled_files|exports_file|name|types_sizes) took 1m0.046858849s"
11/10/2020, 5:16:49 PM level=error msg="Running error: context loading failed: failed to load packages: timed out to load packages: context deadline exceeded"
11/10/2020, 5:16:49 PM level=error msg="Timeout exceeded: try increasing it by passing --timeout option"
11/10/2020, 5:16:49 PM make[1]: *** [Makefile:20: fmt] Error 4
11/10/2020, 5:16:49 PM make[1]: Leaving directory '/root/containerbuild/entrypoint'
11/10/2020, 5:16:49 PM make: *** [Makefile:65: entry] Error 2
11/10/2020, 5:16:53 PM Removing intermediate container b5adc243cfb4

@dustymabe
Copy link
Member

Looks like the build issue was fixed by #1856

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants