Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CI conda: On pull_request, only run 1 macOS job and 1 Linux job #36694

Merged
merged 2 commits into from
Feb 25, 2024

Conversation

mkoeppe
Copy link
Contributor

@mkoeppe mkoeppe commented Nov 10, 2023

as discussed in #36678 (comment), #36616 (comment)

On pushes to a tag, all 3 Linux and 3 macOS jobs are still run, as demonstrated in https://github.com/mkoeppe/sage/actions/runs/6829619271

After merging, the branch protection rule https://github.com/sagemath/sage/settings/branch_protection_rules/33965567 will need to be updated. It controls what workflows show as "Required" in the PR Checks.

📝 Checklist

  • The title is concise, informative, and self-explanatory.
  • The description explains in detail what this PR is about.
  • I have linked a relevant issue or discussion.
  • I have created tests covering the changes.
  • I have updated the documentation accordingly.

⌛ Dependencies

Copy link
Contributor

@tobiasdiez tobiasdiez left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Except for a the time after a release, there is no shortage of macos runners. So I don't think we need to change anything there atm. (Also in my experience there was more of a difference between the behavior on different python versions, e.g. some of the failing tests only occurred under python 3.11 but not on other python versions)

@mkoeppe
Copy link
Contributor Author

mkoeppe commented Nov 11, 2023

What exactly is the change you're requesting here?

@mkoeppe
Copy link
Contributor Author

mkoeppe commented Nov 11, 2023

Except for a the time after a release, there is no shortage of macos runners.

This is clearly false. For example, right now (the macOS jobs from the release have finished already), https://github.com/sagemath/sage/actions/workflows/ci-conda.yml?query=is%3Aqueued shows 16 CI conda workflows waiting for 36 macOS runners. That's over 7 hours of backlog.

in my experience there was more of a difference between the behavior on different python versions, e.g. some of the failing tests only occurred under python 3.11 but not on other python versions)

That doesn't imply that all python versions have to run on all PRs.

@mkoeppe
Copy link
Contributor Author

mkoeppe commented Nov 12, 2023

I think we could also reduce Linux in PRs to one Python version.
Since Build & Test currently uses Python 3.11, we could do conda Linux with Python 3.9 and conda macOS with Python 3.10, so that we still have the nice coverage of the supported minor Python versions.

@mkoeppe
Copy link
Contributor Author

mkoeppe commented Nov 14, 2023

I think we could also reduce Linux in PRs to one Python version.
Since Build & Test currently uses Python 3.11, we could do conda Linux with Python 3.9 and conda macOS with Python 3.10, so that we still have the nice coverage of the supported minor Python versions.

Done in a011089

@mkoeppe mkoeppe changed the title CI conda: On pull_request, only run 1 macOS job CI conda: On pull_request, only run 1 macOS job and 1 Linux job Nov 14, 2023
Copy link
Collaborator

@kwankyu kwankyu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks. LGTM.

@tobiasdiez
Copy link
Contributor

As described above, I consider this a huge step backwards.

@mkoeppe
Copy link
Contributor Author

mkoeppe commented Nov 16, 2023

As described above, I consider this a huge step backwards.

No, Tobias, you haven't said anything like that.

@kwankyu
Copy link
Collaborator

kwankyu commented Nov 23, 2023

Currently, conda ci is the main source that breaks and delays the checks.

Let's get this in.

@tobiasdiez
Copy link
Contributor

Currently, conda ci is the main source that breaks and delays the checks.

This PR is not helping with fixing any of the random breaks. In contrast, it makes it very hard to spot if a failure is random or actual a result of the changes in the PR.

Let me know if you see consistent failures at #36372 and I'll add exclude them (or better yet, provide a fix).

If the conda workflow is too slow, I propose we remove the "max-parallel" constraint. Would also help with discovering the last random failures.

@tobiasdiez
Copy link
Contributor

If the conda workflow is too slow, I propose we remove the "max-parallel" constraint. Would also help with discovering the last random failures.

That's now #36767.

@mkoeppe
Copy link
Contributor Author

mkoeppe commented Nov 25, 2023

And no, it is not hard for our developers to spot whether a failure is random or not.

As you did in #36765 (review)?

@tobiasdiez Would you explain what you mean by this comment?

vbraun pushed a commit to vbraun/sage that referenced this pull request Dec 10, 2023
    
<!-- ^^^^^
Please provide a concise, informative and self-explanatory title.
Don't put issue numbers in there, do this in the PR body below.
For example, instead of "Fixes sagemath#1234" use "Introduce new method to
calculate 1+1"
-->
<!-- Describe your changes here in detail -->

Reduces the executation time of the conda workflow to 2h, so that it is
quicker than the "Build & Test" workflow and devs no longer have to wait
5h until their PR turns green. It also reduces the likelihood that the
conda workflows queue up as now all available macos runners are utilized
immediately.
From
sagemath#36694 (comment).

<!-- Why is this change required? What problem does it solve? -->
<!-- If this PR resolves an open issue, please link to it here. For
example "Fixes sagemath#12345". -->
<!-- If your change requires a documentation PR, please link it
appropriately. -->

### 📝 Checklist

<!-- Put an `x` in all the boxes that apply. -->
<!-- If your change requires a documentation PR, please link it
appropriately -->
<!-- If you're unsure about any of these, don't hesitate to ask. We're
here to help! -->
<!-- Feel free to remove irrelevant items. -->

- [x] The title is concise, informative, and self-explanatory.
- [x] The description explains in detail what this PR is about.
- [x] I have linked a relevant issue or discussion.
- [ ] I have created tests covering the changes.
- [ ] I have updated the documentation accordingly.

### ⌛ Dependencies

<!-- List all open PRs that this PR logically depends on
- sagemath#12345: short description why this is a dependency
- sagemath#34567: ...
-->

<!-- If you're unsure about any of these, don't hesitate to ask. We're
here to help! -->
    
URL: sagemath#36767
Reported by: Tobias Diez
Reviewer(s):
vbraun pushed a commit to vbraun/sage that referenced this pull request Dec 13, 2023
    
<!-- ^^^^^
Please provide a concise, informative and self-explanatory title.
Don't put issue numbers in there, do this in the PR body below.
For example, instead of "Fixes sagemath#1234" use "Introduce new method to
calculate 1+1"
-->
<!-- Describe your changes here in detail -->

Reduces the executation time of the conda workflow to 2h, so that it is
quicker than the "Build & Test" workflow and devs no longer have to wait
5h until their PR turns green. It also reduces the likelihood that the
conda workflows queue up as now all available macos runners are utilized
immediately.
From
sagemath#36694 (comment).

<!-- Why is this change required? What problem does it solve? -->
<!-- If this PR resolves an open issue, please link to it here. For
example "Fixes sagemath#12345". -->
<!-- If your change requires a documentation PR, please link it
appropriately. -->

### 📝 Checklist

<!-- Put an `x` in all the boxes that apply. -->
<!-- If your change requires a documentation PR, please link it
appropriately -->
<!-- If you're unsure about any of these, don't hesitate to ask. We're
here to help! -->
<!-- Feel free to remove irrelevant items. -->

- [x] The title is concise, informative, and self-explanatory.
- [x] The description explains in detail what this PR is about.
- [x] I have linked a relevant issue or discussion.
- [ ] I have created tests covering the changes.
- [ ] I have updated the documentation accordingly.

### ⌛ Dependencies

<!-- List all open PRs that this PR logically depends on
- sagemath#12345: short description why this is a dependency
- sagemath#34567: ...
-->

<!-- If you're unsure about any of these, don't hesitate to ask. We're
here to help! -->
    
URL: sagemath#36767
Reported by: Tobias Diez
Reviewer(s):
@mkoeppe mkoeppe added the disputed PR is waiting for community vote, see https://groups.google.com/g/sage-devel/c/IgBYUJl33SQ label Dec 23, 2023
vbraun pushed a commit to vbraun/sage that referenced this pull request Jan 24, 2024
    
Not all of the added failures are conda-specific, but the 6 instances of
CI Conda amplify the error probability
(sagemath#36694 (comment))

We also add info (ideally a link to the GitHub Issue where the failure
is reported) to the failures and arrange for it to be printed by the
doctester as part of the `[failed in baseline]` message.

Marked as blocker so it takes effect in CI runs -- to reduce the noise
from failures of the Conda CI that PR authors and reviewers find
distracting.

<!-- ^^^^^
Please provide a concise, informative and self-explanatory title.
Don't put issue numbers in there, do this in the PR body below.
For example, instead of "Fixes sagemath#1234" use "Introduce new method to
calculate 1+1"
-->
<!-- Describe your changes here in detail -->

<!-- Why is this change required? What problem does it solve? -->
<!-- If this PR resolves an open issue, please link to it here. For
example "Fixes sagemath#12345". -->
<!-- If your change requires a documentation PR, please link it
appropriately. -->

### 📝 Checklist

<!-- Put an `x` in all the boxes that apply. -->
<!-- If your change requires a documentation PR, please link it
appropriately -->
<!-- If you're unsure about any of these, don't hesitate to ask. We're
here to help! -->
<!-- Feel free to remove irrelevant items. -->

- [x] The title is concise, informative, and self-explanatory.
- [x] The description explains in detail what this PR is about.
- [x] I have linked a relevant issue or discussion.
- [ ] I have created tests covering the changes.
- [ ] I have updated the documentation accordingly.

### ⌛ Dependencies

<!-- List all open PRs that this PR logically depends on
- sagemath#12345: short description why this is a dependency
- sagemath#34567: ...
-->
- Depends on sagemath#36936 (which refactors the `[failed in baseline]`
printing)

<!-- If you're unsure about any of these, don't hesitate to ask. We're
here to help! -->
    
URL: sagemath#36937
Reported by: Matthias Köppe
Reviewer(s): Kwankyu Lee, Matthias Köppe, Tobias Diez
vbraun pushed a commit to vbraun/sage that referenced this pull request Jan 27, 2024
    
Not all of the added failures are conda-specific, but the 6 instances of
CI Conda amplify the error probability
(sagemath#36694 (comment))

We also add info (ideally a link to the GitHub Issue where the failure
is reported) to the failures and arrange for it to be printed by the
doctester as part of the `[failed in baseline]` message.

Marked as blocker so it takes effect in CI runs -- to reduce the noise
from failures of the Conda CI that PR authors and reviewers find
distracting.

<!-- ^^^^^
Please provide a concise, informative and self-explanatory title.
Don't put issue numbers in there, do this in the PR body below.
For example, instead of "Fixes sagemath#1234" use "Introduce new method to
calculate 1+1"
-->
<!-- Describe your changes here in detail -->

<!-- Why is this change required? What problem does it solve? -->
<!-- If this PR resolves an open issue, please link to it here. For
example "Fixes sagemath#12345". -->
<!-- If your change requires a documentation PR, please link it
appropriately. -->

### 📝 Checklist

<!-- Put an `x` in all the boxes that apply. -->
<!-- If your change requires a documentation PR, please link it
appropriately -->
<!-- If you're unsure about any of these, don't hesitate to ask. We're
here to help! -->
<!-- Feel free to remove irrelevant items. -->

- [x] The title is concise, informative, and self-explanatory.
- [x] The description explains in detail what this PR is about.
- [x] I have linked a relevant issue or discussion.
- [ ] I have created tests covering the changes.
- [ ] I have updated the documentation accordingly.

### ⌛ Dependencies

<!-- List all open PRs that this PR logically depends on
- sagemath#12345: short description why this is a dependency
- sagemath#34567: ...
-->
- Depends on sagemath#36936 (which refactors the `[failed in baseline]`
printing)

<!-- If you're unsure about any of these, don't hesitate to ask. We're
here to help! -->
    
URL: sagemath#36937
Reported by: Matthias Köppe
Reviewer(s): Kwankyu Lee, Matthias Köppe, Tobias Diez
vbraun pushed a commit to vbraun/sage that referenced this pull request Jan 29, 2024
    
Not all of the added failures are conda-specific, but the 6 instances of
CI Conda amplify the error probability
(sagemath#36694 (comment))

We also add info (ideally a link to the GitHub Issue where the failure
is reported) to the failures and arrange for it to be printed by the
doctester as part of the `[failed in baseline]` message.

Marked as blocker so it takes effect in CI runs -- to reduce the noise
from failures of the Conda CI that PR authors and reviewers find
distracting.

<!-- ^^^^^
Please provide a concise, informative and self-explanatory title.
Don't put issue numbers in there, do this in the PR body below.
For example, instead of "Fixes sagemath#1234" use "Introduce new method to
calculate 1+1"
-->
<!-- Describe your changes here in detail -->

<!-- Why is this change required? What problem does it solve? -->
<!-- If this PR resolves an open issue, please link to it here. For
example "Fixes sagemath#12345". -->
<!-- If your change requires a documentation PR, please link it
appropriately. -->

### 📝 Checklist

<!-- Put an `x` in all the boxes that apply. -->
<!-- If your change requires a documentation PR, please link it
appropriately -->
<!-- If you're unsure about any of these, don't hesitate to ask. We're
here to help! -->
<!-- Feel free to remove irrelevant items. -->

- [x] The title is concise, informative, and self-explanatory.
- [x] The description explains in detail what this PR is about.
- [x] I have linked a relevant issue or discussion.
- [ ] I have created tests covering the changes.
- [ ] I have updated the documentation accordingly.

### ⌛ Dependencies

<!-- List all open PRs that this PR logically depends on
- sagemath#12345: short description why this is a dependency
- sagemath#34567: ...
-->
- Depends on sagemath#36936 (which refactors the `[failed in baseline]`
printing)

<!-- If you're unsure about any of these, don't hesitate to ask. We're
here to help! -->
    
URL: sagemath#36937
Reported by: Matthias Köppe
Reviewer(s): Kwankyu Lee, Matthias Köppe, Tobias Diez
vbraun pushed a commit to vbraun/sage that referenced this pull request Jan 30, 2024
    
Not all of the added failures are conda-specific, but the 6 instances of
CI Conda amplify the error probability
(sagemath#36694 (comment))

We also add info (ideally a link to the GitHub Issue where the failure
is reported) to the failures and arrange for it to be printed by the
doctester as part of the `[failed in baseline]` message.

Marked as blocker so it takes effect in CI runs -- to reduce the noise
from failures of the Conda CI that PR authors and reviewers find
distracting.

<!-- ^^^^^
Please provide a concise, informative and self-explanatory title.
Don't put issue numbers in there, do this in the PR body below.
For example, instead of "Fixes sagemath#1234" use "Introduce new method to
calculate 1+1"
-->
<!-- Describe your changes here in detail -->

<!-- Why is this change required? What problem does it solve? -->
<!-- If this PR resolves an open issue, please link to it here. For
example "Fixes sagemath#12345". -->
<!-- If your change requires a documentation PR, please link it
appropriately. -->

### 📝 Checklist

<!-- Put an `x` in all the boxes that apply. -->
<!-- If your change requires a documentation PR, please link it
appropriately -->
<!-- If you're unsure about any of these, don't hesitate to ask. We're
here to help! -->
<!-- Feel free to remove irrelevant items. -->

- [x] The title is concise, informative, and self-explanatory.
- [x] The description explains in detail what this PR is about.
- [x] I have linked a relevant issue or discussion.
- [ ] I have created tests covering the changes.
- [ ] I have updated the documentation accordingly.

### ⌛ Dependencies

<!-- List all open PRs that this PR logically depends on
- sagemath#12345: short description why this is a dependency
- sagemath#34567: ...
-->
- Depends on sagemath#36936 (which refactors the `[failed in baseline]`
printing)

<!-- If you're unsure about any of these, don't hesitate to ask. We're
here to help! -->
    
URL: sagemath#36937
Reported by: Matthias Köppe
Reviewer(s): Kwankyu Lee, Matthias Köppe, Tobias Diez
vbraun pushed a commit to vbraun/sage that referenced this pull request Feb 1, 2024
    
Not all of the added failures are conda-specific, but the 6 instances of
CI Conda amplify the error probability
(sagemath#36694 (comment))

We also add info (ideally a link to the GitHub Issue where the failure
is reported) to the failures and arrange for it to be printed by the
doctester as part of the `[failed in baseline]` message.

Marked as blocker so it takes effect in CI runs -- to reduce the noise
from failures of the Conda CI that PR authors and reviewers find
distracting.

<!-- ^^^^^
Please provide a concise, informative and self-explanatory title.
Don't put issue numbers in there, do this in the PR body below.
For example, instead of "Fixes sagemath#1234" use "Introduce new method to
calculate 1+1"
-->
<!-- Describe your changes here in detail -->

<!-- Why is this change required? What problem does it solve? -->
<!-- If this PR resolves an open issue, please link to it here. For
example "Fixes sagemath#12345". -->
<!-- If your change requires a documentation PR, please link it
appropriately. -->

### 📝 Checklist

<!-- Put an `x` in all the boxes that apply. -->
<!-- If your change requires a documentation PR, please link it
appropriately -->
<!-- If you're unsure about any of these, don't hesitate to ask. We're
here to help! -->
<!-- Feel free to remove irrelevant items. -->

- [x] The title is concise, informative, and self-explanatory.
- [x] The description explains in detail what this PR is about.
- [x] I have linked a relevant issue or discussion.
- [ ] I have created tests covering the changes.
- [ ] I have updated the documentation accordingly.

### ⌛ Dependencies

<!-- List all open PRs that this PR logically depends on
- sagemath#12345: short description why this is a dependency
- sagemath#34567: ...
-->
- Depends on sagemath#36936 (which refactors the `[failed in baseline]`
printing)

<!-- If you're unsure about any of these, don't hesitate to ask. We're
here to help! -->
    
URL: sagemath#36937
Reported by: Matthias Köppe
Reviewer(s): Kwankyu Lee, Matthias Köppe, Tobias Diez
Copy link

Documentation preview for this PR (built with commit a49a7c1; changes) is ready! 🎉

@vbraun
Copy link
Member

vbraun commented Feb 18, 2024

Editor decision

CI is going to be more useful if it gives results faster. Its important to search for randomly failing tests, but that should be separate from CI.

@tobiasdiez
Copy link
Contributor

@vbraun Please note that the execution times posted above are outdated. With #36767 the situation improved dramatically. Apart from the time after a release (where runners are heavily used by the portability tests), all three linux runs always finish around the 1h mark. In fact, since we have 60 runners, we would need to have pushes to 12 PRs per hour (one PR triggers 5 linux workflows) in order for the runs to pile up. The situation for macos is similar, although they do sometimes pile up a little bit. But, generally, they do finish around the 3h/4h mark, which is only a bit slower than the Build & Test run. You can see the latest run times at https://github.com/sagemath/sage/actions/workflows/ci-conda.yml.

Its important to search for randomly failing tests, but that should be separate from CI.

In theory, I very much agree. In practice, however, the free CI github infrastructure is the only place for reproducible and convenient mass-scale tests we have at the moment. In fact, me watching the conda runs has lead to identifying and fixing a lot of random test failures (all of which don't seem to be conda specific), see https://github.com/sagemath/sage/issues?q=is%3Aissue+author%3Atobiasdiez+. Randomization as a means to increase test coverage naturally has the tradeoff to trigger sporadic failures in situations where you don't want them.

@mkoeppe
Copy link
Contributor Author

mkoeppe commented Feb 19, 2024

all of which don't seem to be conda specific

If they are not conda specific, why would you need to watch conda workflows to catch them?
(I don't need an answer.)

@tobiasdiez
Copy link
Contributor

tobiasdiez commented Feb 19, 2024

all of which don't seem to be conda specific

If they are not conda specific

As you confirmed yourself (#37071, #37068) and confrimed eg in #37360, they are not conda-specific.

why would you need to watch conda workflows to catch them?

The conda workflow is very convenient for this purpose since we don't spend the majority of the time building dependencies (as in your portability runs). So they run very fast (1h for build and testing). But, of course, if you have a nice grant to pay for computational resources, I would very much appreciate if you could setup a stress test with conda or without, and report all issues you find that way.

@mkoeppe
Copy link
Contributor Author

mkoeppe commented Feb 19, 2024

OK, sounds we are all in agreement that it makes no sense whatsoever to run this activity as part of the CI conda on PR.

Thanks, Volker.

vbraun pushed a commit to vbraun/sage that referenced this pull request Feb 19, 2024
…nd 1 Linux job

    
<!-- ^^^^^
Please provide a concise, informative and self-explanatory title.
Don't put issue numbers in there, do this in the PR body below.
For example, instead of "Fixes sagemath#1234" use "Introduce new method to
calculate 1+1"
-->
<!-- Describe your changes here in detail -->

<!-- Why is this change required? What problem does it solve? -->
as discussed in
sagemath#36678 (comment),
sagemath#36616 (comment)

On pushes to a tag, all 3 Linux and 3 macOS jobs are still run, as
demonstrated in https://github.com/mkoeppe/sage/actions/runs/6829619271

<!-- If this PR resolves an open issue, please link to it here. For
example "Fixes sagemath#12345". -->
<!-- If your change requires a documentation PR, please link it
appropriately. -->

After merging, the branch protection rule https://github.com/sagemath/sa
ge/settings/branch_protection_rules/33965567 will need to be updated. It
controls what workflows show as "Required" in the PR Checks.

### 📝 Checklist

<!-- Put an `x` in all the boxes that apply. -->
<!-- If your change requires a documentation PR, please link it
appropriately -->
<!-- If you're unsure about any of these, don't hesitate to ask. We're
here to help! -->
<!-- Feel free to remove irrelevant items. -->

- [x] The title is concise, informative, and self-explanatory.
- [ ] The description explains in detail what this PR is about.
- [x] I have linked a relevant issue or discussion.
- [ ] I have created tests covering the changes.
- [ ] I have updated the documentation accordingly.

### ⌛ Dependencies

<!-- List all open PRs that this PR logically depends on
- sagemath#12345: short description why this is a dependency
- sagemath#34567: ...
-->

<!-- If you're unsure about any of these, don't hesitate to ask. We're
here to help! -->
    
URL: sagemath#36694
Reported by: Matthias Köppe
Reviewer(s): Kwankyu Lee, Tobias Diez
@vbraun vbraun merged commit eabd2bf into sagemath:develop Feb 25, 2024
13 of 15 checks passed
@mkoeppe mkoeppe added this to the sage-10.3 milestone Mar 7, 2024
@mkoeppe mkoeppe deleted the ci_conda_reduce_macos branch May 11, 2024 19:07
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
c: scripts disputed PR is waiting for community vote, see https://groups.google.com/g/sage-devel/c/IgBYUJl33SQ p: critical / 2
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants