-
Notifications
You must be signed in to change notification settings - Fork 14
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Whither scala/scala CI? #751
Comments
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
It might be worth looking at what GHC has done: they're in a somewhat similar position of having limited resources but aspiring to check as many configurations as possible.
As far as the test suite goes, it would be super helpful to have some way of tagging particular tests so they only run on a specific configuration (JDK version, OS, etc.). Even if we don't have a CI configuration that tests those, the tests provide a good way of bisecting for regressions (or running costly configurations infrequently). |
I tried out https://github.com/philips-labs/terraform-aws-github-runner a few weeks (months?) ago and it worked very well. In short it sets up everything in an AWS account to use on-demand EC2 (spot) instances as custom runners for GitHub Actions. Some challenges
One thing the project doesn't currently support is starting different instance types based on the runner label in GitHub, the GitHub API doesn't provide the necessary data (philips-labs/terraform-aws-github-runner#518). This might be useful for the community build, and for windows testing (philips-labs/terraform-aws-github-runner#347). |
This comment has been minimized.
This comment has been minimized.
note that if anyone (perhaps someone whose initials are M.H.) is thinking of adding anything to our CI currently, favor adding it to Travis-CI, not Jenkins. the Jenkins build is monolithic and there are some pretty archaic scripts involved. whereas the way Travis-CI is set up is pretty close to how we would do it on GitHub Actions |
moving Windows testing off Jenkins and onto GitHub Actions is happening at scala/scala#9496 and scala/scala#9485 (merged on 2.12.x, will be merged forward soon). it is run on merged PRs; it is not part of PR validation and adding JDK 16 (and perhaps 11) to the 2.13 Travis-CI matrix has landed at scala/scala#9579 I have updated the issue description above to remove out-of-date information. |
we replaced JDK 16 with 17 |
Travis-CI status update: note that Travis-CI will no longer offer "concurrency-based" plans except to existing customers who already have them: https://blog.travis-ci.com/2021-12-01-pricingenhancements we're grandfathered in so this doesn't change anything for us at the moment, but it does indicate that the "1 job at a time" plan we're on might disappear entirely someday even if it does, it might not be a bad change, given that the Travis-CI runs we actually need (namely, release runs) we only run rarely, so usage-based pricing might be okay anyway, something to keep one eye on https://app.travis-ci.com/github/scala shows that we're down to just scala, scala-dist, and scala-dist-smoketest
|
Summary:
It wouldn't be super hard to move PR validation entirely to GitHub Actions, and just leave Jenkins in place to publish the PR snapshots (without running tests), and use Travis-CI only for publishing. This would decrease the overall complexity by putting as many eggs as possible in the GitHub Actions basket, but:
Under the circumstances, doing nothing (until some future occurrence forces our hand) may actually make the most sense, despite having three CI systems being manifestly absurd 🤷 I think if Jenkins were to implode we would probably decide to just do without the PR snapshots, but... it hasn't imploded. |
Any reason we can't publish snapshots to Sonatype, with GitHub Actions? |
To publish anything anywhere, just about, you need a publishing secret, but PR runs on GitHub Actions understandably are not given secrets access. We got around this on Jenkins by storing the secrets on the worker nodes in a such way that made them inaccessible even to hostile code in an attacker's PR. It isn't obvious whether there's some trick or workaround we could use on GitHub Actions to somehow sufficiently safely give the PR jobs publishing permission. |
Travis-CI has restored general availability of n-jobs-at-a-time plans like the one we're on: https://blog.travis-ci.com/2022-03-02-concurrentpricing (We still had 1-job-at-a-time because we were grandfathered in, but this increases confidence that it's unlikely to be taken away.) |
I agree, whatever we invest should be towards the goal of getting rid of either Travis or (parts of) our AWS infra. Besides PR builds / validation, our AWS
Maybe others have had similar situations and found solutions? |
Not sure where to put this. If you notice anything missing that could use improvement, feel free to edit this directly, or comment with suggestions or questions. Once we feel it's complete, perhaps we can find a place to put it.
This ticket replaces the similar older #507.
See also other tickets labeled CI/publishing/infra.
Basics for contributors
The redundancy is partly intentional (each system serves to check/verify that the other one is functioning as expected) and partly a historical accident (we are still experimenting with both and the experimentation hasn't concluded).
In particular, every commit in a PR must pass Jenkins. (Travis-CI only tests the
last(edit-dnw) merge commit.)For certain PRs, a maintainer might also choose to manually a trigger a Windows run (via GitHub Actions) and/or a community build run before merge.
The Jenkins build is monolithic, which means you only see "pass/fail", and you have to go digging in the logs to see where the failure was. On the other hand, if the problem is a test failure, the Jenkins UI splits out each tests for you, so it's more digging initially but then less digging later.
The Travis-CI build is split into jobs: build and bootstrap, run partests, run junit and other tests, compile on Dotty... and we could plausibly split it up even further. You can see it a glance in the GitHub UI which part failed.
When digging through logs, there are other minor ergonomic differences between the two UIs.
See "Differences..." below.
This happens even if some tests fail. See the scala/scala README for information.
Differences between Jenkins and Travis-CI for PR validation
Jenkins, in combination with Scabot (which we built ourselves and operate ourselves), tests every commit in a PR. Travis-CI only tests the last commit. It is perhaps not strictly necessary that we require every commit in a PR to pass CI, but it is desirable.
Jenkins tests each commit in the PR's branch. Travis-CI tests a temporary merge commit of the PR's branch and the target branch (e.g. 2.13.x). When we hit "merge" the HEAD of the target branch may already have moved on, so that result may be stale.
Jenkins uses older, substantially more complicated scripts for bootstrapping (see the
scripts
directory). Travis-CI uses a newer, simpler method (see.travis.yml
). The simpler method also more closely resembles how we advise contributors to bootstrap locally. In the long run, we should standardize on the simpler method, but the work of getting rid of the old stuff remains to be done.For no special reason, only Travis-CI includes the
compileWithDotty
test, which verifies that the standard library compiles with the latest Scala 3 release.Only Travis-CI builds the language spec.
How did we get here?
Originally, Jenkins (https://scala-ci.typesafe.com) was our only CI system. But we have to set up and maintain Jenkins ourselves (https://github.com/scala/scala-jenkins-infra) and pay to operate the EC2 instances, so Jenkins is costly for us in both labor and money.
So when the free Travis-CI service came into existence, we thought, let's try it! But we weren't ready to commit to it, so we kept Jenkins around.
Contributors only need to think about PR validation, but the core Scala team also needs a way to publish releases. Originally Scala releases were published from Jenkins, but circa 2018 we decided to move 2.12.x and 2.13.x publishing to Travis-CI, where it has remained ever since.
Why have we kept Jenkins?
Jenkins is a pain to maintain and a pain to expand our CI matrix on (e.g. to other JDK versions), and it's less familiar to most contributors these days than Travis-CI or GitHub Actions. Why do we still have it?
Reasons related to PR validation
Other reasons
The text was updated successfully, but these errors were encountered: