Questions on migrating kubespray CI to test-infra #31351

VannTen · 2023-12-01T16:06:47Z

Hi,

We're currently evaluating migrating the CI of the kubespray project to test-infra from gitlab-ci, and I have some questions and what we can and cannot do with prow and test-infra, so we can decide whether it can work:

Currently, we're handling jobs in gitlab ci stages, because some takes a lot of time and we try to fail early.

I understand prow does not have a job dependency concepts, so I have two possible strategies in mind:

Cancel presubmits on any failure of one of the presubmits. Is that possible / does it work well with things like /retest ?
Use tekton pipelines
-> any strategy on how to avoid costly jobs we know won't matter because one failed ?

Regarding tekton pipelines:

Can't use pod utilies => tasks catalog can maybe help with that, I think, just need to wrap task correctly.
Not sure what becomes of the /test <job-name> stuff. I suppose it's not possible to restart individual parts of the pipeline as if they were Prowjobs ?

Some of our jobs currently provision kubevirt VM (https://github.com/kubernetes-sigs/kubespray/blob/master/tests/cloud_playbooks/roles/packet-ci/templates/vm.yml.j2) to test kubespray runs on them. Is there something in prow/test-infra which can do that for us ? (Didn't find anything but well it does not hurt to ask).

Regarding compute resources:

What's the policy on amount of resources which a project can use ? We have rather big CI runs so this might be a concern. I understand that we tell Prow to execute jobs in a specific cluster, can we bring our own ? Should we ?

That's a lot of different questions in different directions, but I'm trying to figure things out, so sorry if this is a bit unclear.

Related issue on kubespray : kubernetes-sigs/kubespray#10682

Cc @floryut @ant31 from kubespray

Thanks

The text was updated successfully, but these errors were encountered:

VannTen · 2023-12-01T16:07:50Z

/sig testing
/sig cluster-lifecycle

aojea · 2023-12-01T16:17:41Z

What's the policy on amount of resources which a project can use ? We have rather big CI runs so this might be a concern. I understand that we tell Prow to execute jobs in a specific cluster, can we bring our own ? Should we ?

@BenTheElder @ameukam @upodroid for resource usage

aojea · 2023-12-01T16:17:59Z

/sig k8s-infra

BenTheElder · 2023-12-01T17:06:36Z

So far really large testing is basically only done for scale testing the core Kubernetes project.

These are all relative terms though.

SIG K8s Infra owns the actual resource policy, which is not well defined yet, but I can speak to it a little as a lead in both SIGs, can you be more specific about what you're intending to run?

We just went through measures this year to reduce spend, and we're resuming the process of finishing moving lingering CI/resources out of google.com projects into kubernetes.io on GCP in particular.

Cancel presubmits on any failure of one of the presubmits. Is that possible / does it work well with things like /retest ?

No, this is not supported, please don't put lots of expensive testing in presubmit.
You should only test commonly broken workflows in presubmit and the rest in postsubmit / periodic.

Regarding tekton pipelines:

Not supported on prow.k8s.io, sorry.

Some of our jobs currently provision kubevirt VM (https://github.com/kubernetes-sigs/kubespray/blob/master/tests/cloud_playbooks/roles/packet-ci/templates/vm.yml.j2) to test kubespray runs on them. Is there something in prow/test-infra which can do that for us ? (Didn't find anything but well it does not hurt to ask).

No, please do not try to use kubevirt on our clusters (this is why we created KIND), you'll need to spin up remote machines.

neolit123 · 2023-12-01T17:56:33Z

kubespray has been a bit of a black sheep, where the project over the years have drifted away from the commons - it has its own CI and Zoom account, not using the community ways...that's not necessarily bad, but rather peculiar.

We're currently evaluating migrating the CI of the kubespray project to test-infra from gitlab-ci,

could you explain what might be the reasons for such a migration?
(i don't think i saw them listed in the OP)

VannTen · 2023-12-01T19:59:23Z

can you be more specific about what you're intending to run?

Here is a typical PR runs: https://gitlab.com/kargo-ci/kubernetes-sigs-kubespray/-/pipelines/1091836466
What takes the most time is the job deploy-part2, which basically spins up VMs and plays kubespray on it with various configuration (network_plugin, base OS, container runtime).

No, this is not supported, please don't put lots of expensive testing in presubmit. You should only test commonly broken workflows in presubmit and the rest in postsubmit / periodic.

Postsubmits and periodics do not guard merging in the main branch, is that correct ? Understood though.

Regarding tekton pipelines:

Not supported on prow.k8s.io, sorry.

ACK. Just one question, does that depend on the service cluster (where prow itself runs) or the build cluster ?

No, please do not try to use kubevirt on our clusters (this is why we created KIND), you'll need to spin up remote machines.

Ok. But is bringing our own cluster as a Prow "build cluster" a possibility, or not at all ?

We're currently evaluating migrating the CI of the kubespray project to test-infra from gitlab-ci,

could you explain what might be the reasons for such a migration? (i don't think i saw them listed in the OP)

Sorry for that, I listed them in the issue on kubespray side. Basically:

currently /ok-to-test & /retest don't work. (well /ok-to-test allow the one pre-submits jobs listed here ^^).
The lack of /retest in particular is annoying because it basically force us to re-push on flakey tests, which loses /lgtm label -> but there might be a resolution on that if the prow migration proves unfeasible.
the integration between github and gitlab-ci looses some contexts, so conditional jobs based on what files changed does not work
maintaining all of that instead of sharing that with the wider k8s community is more work (probably).

BenTheElder · 2023-12-04T22:12:50Z

Postsubmits and periodics do not guard merging in the main branch, is that correct ? Understood though.

Right, if you find an issue you can revert. We have testgrid.k8s.io to aid in that.
If you're frequently reverting because of changes not caused in presubmit, consider presubmit.

But we consider for example 5,000 node scale tests as an extreme example. We find bugs surfaced in those tests and yet they do not gate all PR merges because this is unreasonably expensive.

ACK. Just one question, does that depend on the service cluster (where prow itself runs) or the build cluster ?

The project doesn't have anyone maintaining support for this. We support prow decorated jobs.

Ok. But is bringing our own cluster as a Prow "build cluster" a possibility, or not at all ?

K8S Infra is only using community managed resources going forward because we've been bitten repeatedly with issues depending on third party controlled accounts etc.

We do not support "bring your own", if anyone wants to help fund the project with assets they can talk to the CNCF about setting up something like https://www.cncf.io/google-cloud-recommits-3m-to-kubernetes/ which SIG K8s Infra administers and SIG Testing uses to run CI.

What takes the most time is the job deploy-part2, which basically spins up VMs and plays kubespray on it with various configuration (network_plugin, base OS, container runtime).

This looks like an expansive test matrix as large as we'd typically do in periodic testing only, not on every PR.

It's difficult to understand what sort of expense we're talking about here though, just seeing the gitlab pipeline names.
Admittedly I haven't really had time yet to try to uncover what exactly they all run and how resource heavy they are.

Generally when sig subprojects have started using our CI in the past they've had relatively minimal needs, some cheap unit tests and so on. We have not had a new distro / deployment tool onboard in a long time since maybe cluster API so there's not a lot of precedent here.

dims · 2023-12-04T22:20:32Z

@VannTen Can you please share a bit of the background on the gitlab infra itself? Who paid for it? who set it up? For some of us this is fresh unforeseen news sadly!

VannTen · 2023-12-05T10:58:24Z

Postsubmits and periodics do not guard merging in the main branch, is that correct ? Understood though.

Right, if you find an issue you can revert. We have testgrid.k8s.io to aid in that. If you're frequently reverting because of changes not caused in presubmit, consider presubmit.

But we consider for example 5,000 node scale tests as an extreme example. We find bugs surfaced in those tests and yet they do not gate all PR merges because this is unreasonably expensive.
...
This looks like an expansive test matrix as large as we'd typically do in periodic testing only, not on every PR.

Yeah, I think it is.

Ok. So in our cases, for example, that would translate to the test matrix moving to periodics, and keeping one/some default configuration tests in presubmits ?
Maybe we could also use run_if_changed to target some configuration ? (network plugin when corresponding role was touched, etc)

It's difficult to understand what sort of expense we're talking about here though, just seeing the gitlab pipeline names. Admittedly I haven't really had time yet to try to uncover what exactly they all run and how resource heavy they are.

The jobs in deploy-part2 typically runs for around 40 minutes, and use 1 to 3 VM (using kubevirt) (see here + the job itself.
I can't find the VMs size, it's defined as small for kubevirt.

Given there is around 20-25 fives config configuration, that adds up.

Generally when sig subprojects have started using our CI in the past they've had relatively minimal needs, some cheap unit tests and so on. We have not had a new distro / deployment tool onboard in a long time since maybe cluster API so there's not a lot of precedent here.

Can you please share a bit of the background on the gitlab infra itself? Who paid for it? who set it up? For some of us this is fresh unforeseen news sadly!

I'll share what I can, I don't have all the information or history.

The integration github <-> gitlab-ci was done by @ant31 if I'm correct, and use https://github.com/failfast-ci/failfast-api

The infrastucture was provided by Packet, (which is now Equinix Metal) and I think it's on CNCF cloud credits
The PRs kubernetes-sigs/kubespray#4538 kubernetes-sigs/kubespray#4537 were made by @woopstar but I'm not exactly sure who handled the cluster setup and the interaction with Packet/Equinix.
Currently, at least @yankay and @floryut have access to it and fix the occasional breakage.

(If some of the people mentioned have more info, feel free to correct or precise 👍 )

ant31 · 2023-12-05T11:30:49Z

Hi all,

The background is that kubespray started with Kubernetes 1.0, so there was little around to help the community.
The community CI used to be Travis-CI. Unfortunately, kubespray was using too many resources, and we had to move out.

CNCF allocated us a few bare-metal nodes (and still does) to run our pipelines.

We are deploying and maintaining those nodes ourselves.
They are running the Gitlab-runners and we're deploying most of the VM (via kubevirt) on it, too.
We also have, or used to have, a few jobs deploying k8s on GCE VM (to test the cloud settings).

Why Gitlab-ci?

2016-2017 The gitlab-ci was a good alternative to combine low maintenance (only need to deploy the runner), and it checkboxes most of the requirements (complex pipeline, with manual jobs and stages), we filled in the missing github integration and features with https://github.com/failfast-ci/failfast-api

We create empty VMs to mimic end-user environments:

Centos+Calico+... configuration.
Ubuntu+Cilium+... configuration
....
The VM are base images: kubespray is doing the OS configuration + kubernetes deployments, and test upgrades.

Moving to prow would remove the need to maintain bare-metal nodes, failfast-ci project, and a few other benefits, but we must be able to configure an equivalent pipeline.

neolit123 · 2023-12-05T11:39:18Z

This looks like an expansive test matrix as large as we'd typically do in periodic testing only, not on every PR.

same for kOps?
https://testgrid.k8s.io/sig-cluster-lifecycle-kops

We create empty VMs to mimic end-user environments:

the test matrix with CNI, distro is redundantly complex (see the kOps case above).
kubeadm and CAPI don't do that, it's too much...and a bit crazy.
if CNI foo decided to regress the k8s infra should not pay $$$ for it.
for distros, there are a 100 Linux flavors.

i wouldn't want us to say -1 on kubespray if they want to move to prow, but if i could, i'd happily take 50% of the test bandwidth of kOps and give it to kubespray.

upodroid · 2023-12-05T14:20:11Z

The bulk of the CI that we run that requires testing on a real virtual machine involves creating VMs on AWS/GCP. We have tooling that handles that for us and you would need to adopt it.

kops is a good example of what you'll need to do to adopt the Kubernetes CI.

Here are a couple of examples:

https://github.com/kubernetes-sigs/provider-aws-test-infra
https://github.com/kubernetes/kops/tree/master/tests/e2e/kubetest2-kops
https://github.com/kubernetes/cloud-provider-gcp/blob/master/e2e/scenarios/kops-simple This is a shell version of the previous example

VannTen · 2023-12-05T15:26:17Z

On Mon, Dec 04, 2023 at 02:13:02PM -0800, Benjamin Elder wrote: > Postsubmits and periodics do not guard merging in the main branch, is that correct ? Understood though. Right, if you find an issue you can revert. We have testgrid.k8s.io to aid in that. If you're _frequently_ reverting because of changes not caused in presubmit, consider presubmit.

Another question about that: what's the typical frequency for periodics ? Daily, weekly ? Do other projects have some strategy in place to avoid breakage in their main branch ? Having a separate "dev" branch for instance, only merged in the main branch at the same frequency than periodics run ?

upodroid · 2023-12-05T15:47:47Z

Another question about that: what's the typical frequency for periodics ? Daily, weekly ?

daily for the latest release, weekly for older supported releases or rare scenarios.

For kubespray in particular, I would test 2 or 3 scenarios for a proper e2e test in presubmits(runs on every push to a PR) and then run the e2e test matrix once a day or twice at most.

BenTheElder · 2023-12-05T18:26:26Z

same for kOps?
https://testgrid.k8s.io/sig-cluster-lifecycle-kops

This is overlooking the "not on presubmit" aspect of my comment. I'm well aware of the kops test matrix, that's exactly what I was thinking of.

That matrix is actually designed to minimally identify which aspect is broken and the tooling for it is in this repo.

i wouldn't want us to say -1 on kubespray if they want to move to prow, but if i could, i'd happily take 50% of the test bandwidth of kOps and give it to kubespray.

I don't think that's a reasonable dichotomy. kops has been using these resources in good faith as a long time participant in upstream test tooling, infra, etc.

Also, we're (SIG Testing + SIG K8s Infra) planning to use kops to replace kube-up because we desperately need to eliminate kube-up.sh and we need to be flexible in AWS+GCP spend, so we certainly don't want to reduce test coverage. (There is a KEP in flight)

BenTheElder · 2023-12-05T18:33:14Z

Another question about that: what's the typical frequency for periodics ? Daily, weekly ?

Do other projects have some strategy in place to avoid breakage in their main branch ? Having a separate "dev" branch for instance, only
merged in the main branch at the same frequency than periodics run ?

Reasonably frequent on the main branch (multiple times per day), much less frequent on stable release branches with frequency decreasing for older releases (and none for out of support releases).

neolit123 · 2023-12-05T18:52:29Z

same for kOps?
https://testgrid.k8s.io/sig-cluster-lifecycle-kops

This is overlooking the "not on presubmit" aspect of my comment. I'm well aware of the kops test matrix, that's exactly what I was thinking of.

i agree with the comments from earlier that presubmit should be minimal and fail-fast.
no matrix testing.

That matrix is actually designed to minimally identify which aspect is broken and the tooling for it is in this repo.

i wouldn't want us to say -1 on kubespray if they want to move to prow, but if i could, i'd happily take 50% of the test bandwidth of kOps and give it to kubespray.

I don't think that's a reasonable dichotomy. kops has been using these resources in good faith as a long time participant in upstream test tooling, infra, etc.

it's not, but it's anecdotally hinting of fairness and non-bias. kubespray should not be denied bandwidth just because they are late for the party.

Also, we're (SIG Testing + SIG K8s Infra) planning to use kops to replace kube-up because we desperately need to eliminate kube-up.sh and we need to be flexible in AWS+GCP spend, so we certainly don't want to reduce test coverage. (There is a KEP in flight)

jobs such as https://testgrid.k8s.io/sig-cluster-lifecycle-kops#kops-grid-cilium-deb10-k27 would not be contributing much to the kube-up replacement picture. such jobs are effectively testing a user deployment scenario. they just guarantees to maintainers and users that a certain deployment scenario works, not that kOps itself works. i don't want to speak behind the intent of these jobs, though.

neolit123 · 2023-12-05T18:56:07Z

i don't think we have a way to measure how much $$ is generated per SIG, but i wild guess that SIG CL is a major contributor to our budget reduction due to how much subprojects and e2e test jobs we have... i would not be surprised if at some point we have to do some sort of evaluation and ask maintainers to limit how much they test.

BenTheElder · 2023-12-05T19:20:51Z

it's not, but it's anecdotally hinting of fairness and non-bias. kubespray should not be denied bandwidth just because they are late for the party.

The problem is moreso that we need to determine if we have bandwidth to spin things up (we probably don't at the moment -- AWS Spend is hitting the budget cap, but we're going to optimize costs) and we've already had to cut down on spend like scale testing this year unfortunately due to lack of options.

We shouldn't do more cutting of existing usage until we have a policy in place. (Though we can run equivilantly with less cost e.g. committed use discounts). We need to have a framework in place before we start kicking things off, we haven't done that yet (because we've been too busy reacting to the ongoing issues).

As-is kubespray has running CI without us cutting any other CI off, so we don't have to choose between projects yet.

i don't think we have a way to measure how much $$ is generated per SIG, but i wild guess that SIG CL is a major contributor to our budget reduction due to how much subprojects and e2e test jobs we have... i would not be surprised if at some point we have to do some sort of evaluation and ask maintainers to limit how much they test.

This is a tricky topic, we have a lot of jobs that aren't really "benefitting" a single SIG.

would not be contributing much to the kube-up replacement picture. such jobs are effectively testing a user deployment scenario.

To that point, the cloud provider testing is specific to a particular vendor ... it's not going to be that simple to dismiss categories of testing. We have similar compat testing with cri-o and containerd. Ideally the project should select testing that benefits broadly but we do have to run with actual implementations evetually.

BenTheElder · 2023-12-05T19:36:31Z

So, I think we can run kubespray CI on prow, but it remains an open question how best to enable the test environments you need and how much we can afford.

I don't think that's kubevirt, we use managed k8s clusters because we have limited bandwidth to maintain these things and nested virt isn't enabled.

We can start with something small like unit tests so they can get familiar with prow and we don't need to worry too much about the resources needed for that.

For e2e testing:

When other projects spin up external assets they do so by renting resources through https://github.com/kubernetes-sigs/boskos typically through integration in https://github.com/kubernetes-sigs/kubetest2 to ensure that they will be automatically cleaned up if the CI job is abruptly terminated or otherwise fails to clean-up after itself.

This aspect is pretty important, I'd ask that we make sure boskos is used if / when e2e tests are setup. CAPI, kops, Kubernetes etc use this.

BenTheElder · 2023-12-07T16:41:16Z

aside re: freeing up resources for CI etc ... we dug into our expenditures in the bi-weekly k8s infra call yesterday and the main outcome is going on here kubernetes/k8s.io#6165

I think we can easily run things like build/unit test/lint on prow already but it will take more work to setup a suitable envionment for the e2e tests. We haven't used packet/equinix from prow before but that might be an option for running essentially the same e2e environment.

What if we ran a build cluster on equinix w/ kubevirt? Would the kubespray team be up for maintaining this?
I think prow as-is can handle scheduling to a cluster like this fine.

We probably need to discuss options more between k8s infra and sig testing calls.

ant31 · 2023-12-08T15:16:21Z

What if we ran a build cluster on equinix w/ kubevirt? Would the kubespray team be up for maintaining this?
I think prow as-is can handle scheduling to a cluster like this fine.

Yes, it would work I think. I don't know prow enough know what would need to change if any.
So I'll describe current CI:
new-pr --> Gitlab CI triggered-> Gitlab schedule Jobs on gitlab-runners deployed on equinix
A job is started and the runner execute the following:

Create kubevirt VM via kubectl apply
Wait for VM to be up
Deploy kubernetes on the VM
Test the cluster
Destroy VM

if there's an equivalent of gitlab-runner for prow(prow-runner?) deployed on that cluster, then it would use the resources that kubespray has already without adding loads/expenses on k8s-infra

As nice to have, maybe step 1,2 and 5 could be handled by prow so it's easily reproduced by all projects (to create kubevirt VM). In all cases it's not a blocker.

VannTen · 2023-12-08T21:36:08Z

What if we ran a build cluster on equinix w/ kubevirt? Would the kubespray team be up for maintaining this?

I don't know about the others kubespray contributors, but I could participate. I have some dedicated time for upstream work + the down-time, and my main occupation is maintaining clusters anyway.

VannTen · 2023-12-15T09:38:05Z

We haven't used packet/equinix from prow before but that might be an option for running essentially the same e2e environment.

What if we ran a build cluster on equinix w/ kubevirt? Would the kubespray team be up for maintaining this? I think prow as-is can handle scheduling to a cluster like this fine.

In that case, would the same constraints (mainly, moving stuff to periodics) apply ?

k8s-triage-robot · 2024-03-14T10:03:15Z

The Kubernetes project currently lacks enough contributors to adequately respond to all issues.

This bot triages un-triaged issues according to the following rules:

After 90d of inactivity, lifecycle/stale is applied
After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

Mark this issue as fresh with /remove-lifecycle stale
Close this issue with /close
Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

VannTen · 2024-03-14T10:05:00Z

/remove-lifecycle stale

BenTheElder · 2024-05-30T18:51:04Z

In that case, would the same constraints (mainly, moving stuff to periodics) apply ?

I don't think we have transparent budget info or credentials for equinix in SIG K8s infra currently so it's hard to say, AFAIK that's similarly ~CNCF, like your current gitlab instance, rather than Kubernetes owned/managed.

cc @dims who has the only Kubernetes related equinix infra I've previously seen (cs.k8s.io, a single machine AFAIK).

BenTheElder · 2024-05-30T18:52:47Z

(We are still planning the migration of prow control plane to k8s infra this year, amongst other things, I'm personally a bit over-extended WRT k8s infra but I'm not the only lead, I know Arnaud is out for a while currently)

k8s-triage-robot · 2024-08-28T18:56:37Z

The Kubernetes project currently lacks enough contributors to adequately respond to all issues.

This bot triages un-triaged issues according to the following rules:

After 90d of inactivity, lifecycle/stale is applied
After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

Mark this issue as fresh with /remove-lifecycle stale
Close this issue with /close
Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

BenTheElder · 2024-08-28T19:47:24Z

/remove-lifecycle stale

We've ~all but completed the CI migration, we need to rotate the log bucket and we have the same issue for release binaries behind fastly.

Both are in progress, the preliminary work is in place but there's a lot of lingering references.

We should be starting to get a stable idea where our usage is at.

We now have spend reporting for GCP, AWS, Fastly, Digital Ocean and Azure, with budgets known except DO.

I think we still have a very small presence on equinix currently, just cs.k8s.io, which is one VM in @dims's hands, so it's not really actively tracked yet.

Things have settled quite a bit and we should really revisit this.

We create and dispose a LOT of VMs on GCP and AWS every day using projects/accounts rented from https://github.com/kubernetes-sigs/boskos

I would still recommend exploring a gradual transition with lighter and simpler workloads first and make sure the merge robot and so on are working and we can continue to explore e2e testing in parallel.

For the most part we're looking at projects to file in https://github.com/kubernetes/k8s.io for bespoke infra needs, for basic CI jobs #sig-testing can help, slack is a good bit more active than this issue tracker at the moment.

VannTen · 2024-10-04T12:49:25Z

Thanks for the update and info 👍 To gave one from kubespray side: There has been a lot of work on our CI, and other improvements planned, but it currently looks like we will use **more** features of our current setup (gitlab-ci features + the fact we're running the test and the provisioned Kubevirt VMs in the same cluster), which I think will make a migration to prow less desirable. For now, I think we can froze this, if that's ok with you, and see where we are once our CI reworkings have settled a bit. /lifecycle frozen

upodroid · 2024-10-04T15:29:23Z

There has been a lot of work on our CI, and other improvements planned, but it currently looks like we will use more features of our current setup (gitlab-ci features + the fact we're running the test and the provisioned Kubevirt VMs in the same cluster), which I think will make a migration to prow less desirable.

Can you tell us more about this? We would like to know more about this and see how prow fits in this. We recently helped etcd project adopt prow and we enabled the use cases they needed for a successful migration.

VannTen · 2024-10-05T21:51:47Z

Sure. Compared to last time, the following has changed: - We now have working /retest /retest-failed and /ok-to-test (thanks to @ant31 work on failfast). There are some hiccups thought 🤔 - the pipelines have been redesigned to use gitlab-ci `needs` rather than stage (aka, it's a DAG of jobs) - we have three "level" of testing using labels ('ci-short|extended|full') -> still needs some stuff to configure, notably the label plugin. - some cleanups of obsolete stuff (The last three have made the pipelines much faster) In the works: - reworking the CI resources cleanup. It's currently racey which makes PR pipelines flakey. -> kubernetes-sigs/kubespray#11530 (TL;DR : use ownerRefs on kubevirt VMs so they're deleted when the pod running the job is) - distributed cache. We have a lot of flake recently because we use vagrant for some jobs and we're rate-limited vagrant box hosting, probably because we're downloading the same boxes again and again. This should solve that and accelerate some jobs as well. Regarding switching to prow, if we don't consider the work of switching itself, they are pros and cons: Pros: - conditional running (==if some files changed). Currently we lose the information in the GitHub-> gitlab integration - merge pool Cons: - no DAG support (AFAIK) - more generally, .gitlab-ci.ym has more features than prowjobs, I believe. We don't use all of them, but I do think we would need to re-think some stuff to fit in prow. I'm probably forgetting some stuff, but that is what comes to mind at the moment.

k8s-ci-robot added the needs-sig Indicates an issue or PR lacks a `sig/foo` label and requires one. label Dec 1, 2023

VannTen changed the title ~~Question on migrating kubespray CI to test-infra~~ Questions on migrating kubespray CI to test-infra Dec 1, 2023

k8s-ci-robot added sig/testing Categorizes an issue or PR as relevant to SIG Testing. sig/cluster-lifecycle Categorizes an issue or PR as relevant to SIG Cluster Lifecycle. and removed needs-sig Indicates an issue or PR lacks a `sig/foo` label and requires one. labels Dec 1, 2023

k8s-ci-robot added the sig/k8s-infra Categorizes an issue or PR as relevant to SIG K8s Infra. label Dec 1, 2023

k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Mar 14, 2024

k8s-ci-robot removed the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Mar 14, 2024

k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Aug 28, 2024

k8s-ci-robot removed the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Aug 28, 2024

k8s-ci-robot added the lifecycle/frozen Indicates that an issue or PR should not be auto-closed due to staleness. label Oct 4, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Questions on migrating kubespray CI to test-infra #31351

Questions on migrating kubespray CI to test-infra #31351

VannTen commented Dec 1, 2023

VannTen commented Dec 1, 2023

aojea commented Dec 1, 2023

aojea commented Dec 1, 2023

BenTheElder commented Dec 1, 2023

neolit123 commented Dec 1, 2023 •

edited

Loading

VannTen commented Dec 1, 2023

BenTheElder commented Dec 4, 2023 •

edited

Loading

dims commented Dec 4, 2023

VannTen commented Dec 5, 2023

ant31 commented Dec 5, 2023

neolit123 commented Dec 5, 2023

upodroid commented Dec 5, 2023

VannTen commented Dec 5, 2023 via email

upodroid commented Dec 5, 2023

BenTheElder commented Dec 5, 2023 •

edited

Loading

BenTheElder commented Dec 5, 2023

neolit123 commented Dec 5, 2023

neolit123 commented Dec 5, 2023

BenTheElder commented Dec 5, 2023 •

edited

Loading

BenTheElder commented Dec 5, 2023

BenTheElder commented Dec 7, 2023

ant31 commented Dec 8, 2023

VannTen commented Dec 8, 2023 via email

VannTen commented Dec 15, 2023

k8s-triage-robot commented Mar 14, 2024

VannTen commented Mar 14, 2024 via email

BenTheElder commented May 30, 2024

BenTheElder commented May 30, 2024

k8s-triage-robot commented Aug 28, 2024

BenTheElder commented Aug 28, 2024

VannTen commented Oct 4, 2024 via email

upodroid commented Oct 4, 2024

VannTen commented Oct 5, 2024 via email

Questions on migrating kubespray CI to test-infra #31351

Questions on migrating kubespray CI to test-infra #31351

Comments

VannTen commented Dec 1, 2023

VannTen commented Dec 1, 2023

aojea commented Dec 1, 2023

aojea commented Dec 1, 2023

BenTheElder commented Dec 1, 2023

neolit123 commented Dec 1, 2023 • edited Loading

VannTen commented Dec 1, 2023

BenTheElder commented Dec 4, 2023 • edited Loading

dims commented Dec 4, 2023

VannTen commented Dec 5, 2023

ant31 commented Dec 5, 2023

neolit123 commented Dec 5, 2023

upodroid commented Dec 5, 2023

VannTen commented Dec 5, 2023 via email

upodroid commented Dec 5, 2023

BenTheElder commented Dec 5, 2023 • edited Loading

BenTheElder commented Dec 5, 2023

neolit123 commented Dec 5, 2023

neolit123 commented Dec 5, 2023

BenTheElder commented Dec 5, 2023 • edited Loading

BenTheElder commented Dec 5, 2023

BenTheElder commented Dec 7, 2023

ant31 commented Dec 8, 2023

VannTen commented Dec 8, 2023 via email

VannTen commented Dec 15, 2023

k8s-triage-robot commented Mar 14, 2024

VannTen commented Mar 14, 2024 via email

BenTheElder commented May 30, 2024

BenTheElder commented May 30, 2024

k8s-triage-robot commented Aug 28, 2024

BenTheElder commented Aug 28, 2024

VannTen commented Oct 4, 2024 via email

upodroid commented Oct 4, 2024

VannTen commented Oct 5, 2024 via email

neolit123 commented Dec 1, 2023 •

edited

Loading

BenTheElder commented Dec 4, 2023 •

edited

Loading

BenTheElder commented Dec 5, 2023 •

edited

Loading

BenTheElder commented Dec 5, 2023 •

edited

Loading