-
Notifications
You must be signed in to change notification settings - Fork 14.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Current CI builds slightly different sources than the one in PR build #10471
Comments
We had to enable mounting from sources for a short while because we had to find a way to add new scripts to the "workflow_run" workflow we have. This also requires the apache#10470 to be merged - perf_kit to be moved to tests.utils because it was in a separate directory and image without mounting sources could not run the tests. It also partially addresses the apache#10445 problem where there was difference between sources in the image and coming from the master. This comes from GitHub running merge on non-conflicting changes in the PR and something that will be addressed shortly. The issue apache#10471 discusses this in detail.
We had to enable mounting from sources for a short while because we had to find a way to add new scripts to the "workflow_run" workflow we have. This also requires the #10470 to be merged - perf_kit to be moved to tests.utils because it was in a separate directory and image without mounting sources could not run the tests. It also partially addresses the #10445 problem where there was difference between sources in the image and coming from the master. This comes from GitHub running merge on non-conflicting changes in the PR and something that will be addressed shortly. The issue #10471 discusses this in detail.
I might want to try to fix that one if we have more problems like this.. The problems will manifest in sometimes weird errors if the PR has not been rebased to the latest master. |
For example missing dependencies that has just been added. In any case, the workaround is simply to rebase such PR to the latest master, but I think I saw several of those happening recently, pretty much always when we add new dependencies, so it might be worth fixing. |
We had to enable mounting from sources for a short while because we had to find a way to add new scripts to the "workflow_run" workflow we have. This also requires the apache#10470 to be merged - perf_kit to be moved to tests.utils because it was in a separate directory and image without mounting sources could not run the tests. It also partially addresses the apache#10445 problem where there was difference between sources in the image and coming from the master. This comes from GitHub running merge on non-conflicting changes in the PR and something that will be addressed shortly. The issue apache#10471 discusses this in detail. (cherry picked from commit 4fa7df5)
We had to enable mounting from sources for a short while because we had to find a way to add new scripts to the "workflow_run" workflow we have. This also requires the #10470 to be merged - perf_kit to be moved to tests.utils because it was in a separate directory and image without mounting sources could not run the tests. It also partially addresses the #10445 problem where there was difference between sources in the image and coming from the master. This comes from GitHub running merge on non-conflicting changes in the PR and something that will be addressed shortly. The issue #10471 discusses this in detail. (cherry picked from commit 4fa7df5)
I haven't yet had a chance to look in to this new Build Image workflow, but is it not possible to have this run as a second workflow file on PRs, rather than running it against master? |
More predictable, perhaps, but less useful overall: By running against the result of merging master and the PR, the tests effectively test "will this work when merged to master" which is more important than just "doe the tests pass on the branch". By only testing the PR directly we're more likely to have broken master builds that we don't notice straight away (I don't know about anyone else, but when I merge a PR I don't even think about checking the build status on master 20-40mins later when it runs.), so changing to only test PR, not the merge result, would lead to more work needing to track down when the build started failing. |
Yeah - agree it is less useful. Looking at the number of issues it caused - they are not very big, but looking at how difficult it is to diagnose them and explain them to users, I am also for doing what GitHub is doing by default (i.e. merging). I will take a look at how I can implement it in an easiest and robust way. |
The PR builds are now better handled. First of all we are using merged commit from the PR, not the original commit from the PR. Secondly - the workflow run notifies the original PR with comment stating that the image is being built in a separate workflow - including the link to that workflow. Thirdly - when canceling duplicate PRs or PRs with failed jobs, the workflow will add a comment to the PR stating the reason why the PR is being cancelled. Last but not least, we also add cancel job for the CodeQL duplicate messages. They run for ~ 12 miinutes so it makes perfect sense to also cancel those CodeQL jobs for which someone pushed fixups in a quick succession. Fixes: apache#10471
The PR builds are now better handled with regards to both running (using merge-request) and canceling (with cancel notifications). First of all we are using merged commit from the PR, not the original commit from the PR. Secondly - the workflow run notifies the original PR with comment stating that the image is being built in a separate workflow - including the link to that workflow. Thirdly - when canceling duplicate PRs or PRs with failed jobs, the workflow will add a comment to the PR stating the reason why the PR is being cancelled. Last but not least, we also add cancel job for the CodeQL duplicate messages. They run for ~ 12 miinutes so it makes perfect sense to also cancel those CodeQL jobs for which someone pushed fixups in a quick succession. Fixes: apache#10471
The PR builds are now better handled with regards to both running (using merge-request) and canceling (with cancel notifications). First of all we are using merged commit from the PR, not the original commit from the PR. Secondly - the workflow run notifies the original PR with comment stating that the image is being built in a separate workflow - including the link to that workflow. Thirdly - when canceling duplicate PRs or PRs with failed jobs, the workflow will add a comment to the PR stating the reason why the PR is being cancelled. Last but not least, we also add cancel job for the CodeQL duplicate messages. They run for ~ 12 miinutes so it makes perfect sense to also cancel those CodeQL jobs for which someone pushed fixups in a quick succession. Fixes: apache#10471
The PR builds are now better handled with regards to both running (using merge-request) and canceling (with cancel notifications). First of all we are using merged commit from the PR, not the original commit from the PR. Secondly - the workflow run notifies the original PR with comment stating that the image is being built in a separate workflow - including the link to that workflow. Thirdly - when canceling duplicate PRs or PRs with failed jobs, the workflow will add a comment to the PR stating the reason why the PR is being cancelled. Last but not least, we also add cancel job for the CodeQL duplicate messages. They run for ~ 12 miinutes so it makes perfect sense to also cancel those CodeQL jobs for which someone pushed fixups in a quick succession. Fixes: apache#10471
The PR builds are now better handled with regards to both running (using merge-request) and canceling (with cancel notifications). First of all we are using merged commit from the PR, not the original commit from the PR. Secondly - the workflow run notifies the original PR with comment stating that the image is being built in a separate workflow - including the link to that workflow. Thirdly - when canceling duplicate PRs or PRs with failed jobs, the workflow will add a comment to the PR stating the reason why the PR is being cancelled. Last but not least, we also add cancel job for the CodeQL duplicate messages. They run for ~ 12 miinutes so it makes perfect sense to also cancel those CodeQL jobs for which someone pushed fixups in a quick succession. Fixes: apache#10471
The PR builds are now better handled with regards to both running (using merge-request) and canceling (with cancel notifications). First of all we are using merged commit from the PR, not the original commit from the PR. Secondly - the workflow run notifies the original PR with comment stating that the image is being built in a separate workflow - including the link to that workflow. Thirdly - when canceling duplicate PRs or PRs with failed jobs, the workflow will add a comment to the PR stating the reason why the PR is being cancelled. Last but not least, we also add cancel job for the CodeQL duplicate messages. They run for ~ 12 miinutes so it makes perfect sense to also cancel those CodeQL jobs for which someone pushed fixups in a quick succession. Fixes: #10471
The PR builds are now better handled with regards to both running (using merge-request) and canceling (with cancel notifications). First of all we are using merged commit from the PR, not the original commit from the PR. Secondly - the workflow run notifies the original PR with comment stating that the image is being built in a separate workflow - including the link to that workflow. Thirdly - when canceling duplicate PRs or PRs with failed jobs, the workflow will add a comment to the PR stating the reason why the PR is being cancelled. Last but not least, we also add cancel job for the CodeQL duplicate messages. They run for ~ 12 miinutes so it makes perfect sense to also cancel those CodeQL jobs for which someone pushed fixups in a quick succession. Fixes: #10471 (cherry picked from commit a4478f5)
We had to enable mounting from sources for a short while because we had to find a way to add new scripts to the "workflow_run" workflow we have. This also requires the apache#10470 to be merged - perf_kit to be moved to tests.utils because it was in a separate directory and image without mounting sources could not run the tests. It also partially addresses the apache#10445 problem where there was difference between sources in the image and coming from the master. This comes from GitHub running merge on non-conflicting changes in the PR and something that will be addressed shortly. The issue apache#10471 discusses this in detail. (cherry picked from commit 4fa7df5)
…1268) The PR builds are now better handled with regards to both running (using merge-request) and canceling (with cancel notifications). First of all we are using merged commit from the PR, not the original commit from the PR. Secondly - the workflow run notifies the original PR with comment stating that the image is being built in a separate workflow - including the link to that workflow. Thirdly - when canceling duplicate PRs or PRs with failed jobs, the workflow will add a comment to the PR stating the reason why the PR is being cancelled. Last but not least, we also add cancel job for the CodeQL duplicate messages. They run for ~ 12 miinutes so it makes perfect sense to also cancel those CodeQL jobs for which someone pushed fixups in a quick succession. Fixes: apache#10471 (cherry picked from commit a4478f5)
We had to enable mounting from sources for a short while because we had to find a way to add new scripts to the "workflow_run" workflow we have. This also requires the #10470 to be merged - perf_kit to be moved to tests.utils because it was in a separate directory and image without mounting sources could not run the tests. It also partially addresses the #10445 problem where there was difference between sources in the image and coming from the master. This comes from GitHub running merge on non-conflicting changes in the PR and something that will be addressed shortly. The issue #10471 discusses this in detail. (cherry picked from commit 4fa7df5)
The PR builds are now better handled with regards to both running (using merge-request) and canceling (with cancel notifications). First of all we are using merged commit from the PR, not the original commit from the PR. Secondly - the workflow run notifies the original PR with comment stating that the image is being built in a separate workflow - including the link to that workflow. Thirdly - when canceling duplicate PRs or PRs with failed jobs, the workflow will add a comment to the PR stating the reason why the PR is being cancelled. Last but not least, we also add cancel job for the CodeQL duplicate messages. They run for ~ 12 miinutes so it makes perfect sense to also cancel those CodeQL jobs for which someone pushed fixups in a quick succession. Fixes: #10471 (cherry picked from commit a4478f5)
We had to enable mounting from sources for a short while because we had to find a way to add new scripts to the "workflow_run" workflow we have. This also requires the #10470 to be merged - perf_kit to be moved to tests.utils because it was in a separate directory and image without mounting sources could not run the tests. It also partially addresses the #10445 problem where there was difference between sources in the image and coming from the master. This comes from GitHub running merge on non-conflicting changes in the PR and something that will be addressed shortly. The issue #10471 discusses this in detail. (cherry picked from commit 4fa7df5)
The PR builds are now better handled with regards to both running (using merge-request) and canceling (with cancel notifications). First of all we are using merged commit from the PR, not the original commit from the PR. Secondly - the workflow run notifies the original PR with comment stating that the image is being built in a separate workflow - including the link to that workflow. Thirdly - when canceling duplicate PRs or PRs with failed jobs, the workflow will add a comment to the PR stating the reason why the PR is being cancelled. Last but not least, we also add cancel job for the CodeQL duplicate messages. They run for ~ 12 miinutes so it makes perfect sense to also cancel those CodeQL jobs for which someone pushed fixups in a quick succession. Fixes: #10471 (cherry picked from commit a4478f5)
The PR builds are now better handled with regards to both running (using merge-request) and canceling (with cancel notifications). First of all we are using merged commit from the PR, not the original commit from the PR. Secondly - the workflow run notifies the original PR with comment stating that the image is being built in a separate workflow - including the link to that workflow. Thirdly - when canceling duplicate PRs or PRs with failed jobs, the workflow will add a comment to the PR stating the reason why the PR is being cancelled. Last but not least, we also add cancel job for the CodeQL duplicate messages. They run for ~ 12 miinutes so it makes perfect sense to also cancel those CodeQL jobs for which someone pushed fixups in a quick succession. Fixes: #10471 (cherry picked from commit a4478f5)
We had to enable mounting from sources for a short while because we had to find a way to add new scripts to the "workflow_run" workflow we have. This also requires the apache#10470 to be merged - perf_kit to be moved to tests.utils because it was in a separate directory and image without mounting sources could not run the tests. It also partially addresses the apache#10445 problem where there was difference between sources in the image and coming from the master. This comes from GitHub running merge on non-conflicting changes in the PR and something that will be addressed shortly. The issue apache#10471 discusses this in detail. (cherry picked from commit 4fa7df5)
…1268) The PR builds are now better handled with regards to both running (using merge-request) and canceling (with cancel notifications). First of all we are using merged commit from the PR, not the original commit from the PR. Secondly - the workflow run notifies the original PR with comment stating that the image is being built in a separate workflow - including the link to that workflow. Thirdly - when canceling duplicate PRs or PRs with failed jobs, the workflow will add a comment to the PR stating the reason why the PR is being cancelled. Last but not least, we also add cancel job for the CodeQL duplicate messages. They run for ~ 12 miinutes so it makes perfect sense to also cancel those CodeQL jobs for which someone pushed fixups in a quick succession. Fixes: apache#10471 (cherry picked from commit a4478f5)
The current CI system builds a slightly different set of sources in GitHub than the one in PR directly. But possibly it is better. We can fix it with a workaround but I am not sure if we should.
Details:
when we currently build the image in "Build Image" workflow we are using "scripts/ci" from master and rest of the sources from the commit that is the HEAD of PR.
However the sources in the PR run by GitHub are slightly different. When there are no conflicts, Github actually performs a merge between the master and the PR and the sources in build PR are those merged sources. But this only happens when there is no conflict, otherwise, original sources are used.
This is not a big issue IMHO now and I'd argue that it's better if we run original sources, because that might be a source of confusion if someone tries to reproduce it locally. What you have to do to replicate the PR build, you have to rebase it locally on top of latest master. But this is not at all clear - I just learned that this is happening while implementing the new CI and had no idea that this was happening - and it could have explained a number of "We do not know what happened in this CI" cases.
I can workaround this (there's no API in Github to know what is the merge request Commit SHA, but I Already use similar workarounds passing information via job names (waiting for missing API from GitHub). But I am not sure if it is worth it.
For now it might cause problems similar to the ones in #10445 but a fix is coming in #10472 so that such failures will not happen but we will use the HEAD of PR as commit SHA still. Possibly also #10806
Another problem that was caused by this was https://github.com/apache/airflow/runs/1108600218?check_suite_focus=true resulting from merging #10898. In this PR new packages were added, and later not rebased PR which has been just merged, and the PR #10906 which was not rebased on top of it resulted in Pylint being run on "merged" sources with an image that did not contain the "merged" dependencies.
@kaxil @feluelle @turbaszek @mik-laj @dimberman - WDYT.
BTW. We also have a safety net. The build with "merged" sources happens always after we merge the PR anyway - so in case a problem is hidden we will see it failing at "push" event.
The text was updated successfully, but these errors were encountered: