Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Current CI builds slightly different sources than the one in PR build #10471

Closed
potiuk opened this issue Aug 22, 2020 · 5 comments · Fixed by #11268
Closed

Current CI builds slightly different sources than the one in PR build #10471

potiuk opened this issue Aug 22, 2020 · 5 comments · Fixed by #11268
Assignees
Labels
area:CI Airflow's tests and continious integration kind:bug This is a clearly a bug

Comments

@potiuk
Copy link
Member

potiuk commented Aug 22, 2020

The current CI system builds a slightly different set of sources in GitHub than the one in PR directly. But possibly it is better. We can fix it with a workaround but I am not sure if we should.

Details:

  1. when we currently build the image in "Build Image" workflow we are using "scripts/ci" from master and rest of the sources from the commit that is the HEAD of PR.

  2. However the sources in the PR run by GitHub are slightly different. When there are no conflicts, Github actually performs a merge between the master and the PR and the sources in build PR are those merged sources. But this only happens when there is no conflict, otherwise, original sources are used.

This is not a big issue IMHO now and I'd argue that it's better if we run original sources, because that might be a source of confusion if someone tries to reproduce it locally. What you have to do to replicate the PR build, you have to rebase it locally on top of latest master. But this is not at all clear - I just learned that this is happening while implementing the new CI and had no idea that this was happening - and it could have explained a number of "We do not know what happened in this CI" cases.

I can workaround this (there's no API in Github to know what is the merge request Commit SHA, but I Already use similar workarounds passing information via job names (waiting for missing API from GitHub). But I am not sure if it is worth it.

For now it might cause problems similar to the ones in #10445 but a fix is coming in #10472 so that such failures will not happen but we will use the HEAD of PR as commit SHA still. Possibly also #10806

Another problem that was caused by this was https://github.com/apache/airflow/runs/1108600218?check_suite_focus=true resulting from merging #10898. In this PR new packages were added, and later not rebased PR which has been just merged, and the PR #10906 which was not rebased on top of it resulted in Pylint being run on "merged" sources with an image that did not contain the "merged" dependencies.

@kaxil @feluelle @turbaszek @mik-laj @dimberman - WDYT.

BTW. We also have a safety net. The build with "merged" sources happens always after we merge the PR anyway - so in case a problem is hidden we will see it failing at "push" event.

@potiuk potiuk added kind:bug This is a clearly a bug area:CI Airflow's tests and continious integration labels Aug 22, 2020
potiuk added a commit to PolideaInternal/airflow that referenced this issue Aug 23, 2020
We had to enable mounting from sources for a short while
because we had to find a way to add new scripts to the
"workflow_run" workflow we have. This also requires
the apache#10470 to be merged - perf_kit to be moved to tests.utils because
it was in a separate directory and image without mounting sources
could not run the tests.

It also partially addresses the apache#10445 problem where
there was difference between sources in the image and coming
from the master. This comes from GitHub running merge on
non-conflicting changes in the PR and something that will
be addressed shortly.

The issue apache#10471 discusses this in detail.
potiuk added a commit that referenced this issue Aug 24, 2020
We had to enable mounting from sources for a short while
because we had to find a way to add new scripts to the
"workflow_run" workflow we have. This also requires
the #10470 to be merged - perf_kit to be moved to tests.utils because
it was in a separate directory and image without mounting sources
could not run the tests.

It also partially addresses the #10445 problem where
there was difference between sources in the image and coming
from the master. This comes from GitHub running merge on
non-conflicting changes in the PR and something that will
be addressed shortly.

The issue #10471 discusses this in detail.
@potiuk
Copy link
Member Author

potiuk commented Sep 13, 2020

I might want to try to fix that one if we have more problems like this.. The problems will manifest in sometimes weird errors if the PR has not been rebased to the latest master.

@potiuk
Copy link
Member Author

potiuk commented Sep 13, 2020

For example missing dependencies that has just been added. In any case, the workaround is simply to rebase such PR to the latest master, but I think I saw several of those happening recently, pretty much always when we add new dependencies, so it might be worth fixing.

@potiuk potiuk self-assigned this Sep 13, 2020
potiuk added a commit to potiuk/airflow that referenced this issue Sep 14, 2020
We had to enable mounting from sources for a short while
because we had to find a way to add new scripts to the
"workflow_run" workflow we have. This also requires
the apache#10470 to be merged - perf_kit to be moved to tests.utils because
it was in a separate directory and image without mounting sources
could not run the tests.

It also partially addresses the apache#10445 problem where
there was difference between sources in the image and coming
from the master. This comes from GitHub running merge on
non-conflicting changes in the PR and something that will
be addressed shortly.

The issue apache#10471 discusses this in detail.

(cherry picked from commit 4fa7df5)
potiuk added a commit that referenced this issue Sep 15, 2020
We had to enable mounting from sources for a short while
because we had to find a way to add new scripts to the
"workflow_run" workflow we have. This also requires
the #10470 to be merged - perf_kit to be moved to tests.utils because
it was in a separate directory and image without mounting sources
could not run the tests.

It also partially addresses the #10445 problem where
there was difference between sources in the image and coming
from the master. This comes from GitHub running merge on
non-conflicting changes in the PR and something that will
be addressed shortly.

The issue #10471 discusses this in detail.

(cherry picked from commit 4fa7df5)
@ashb
Copy link
Member

ashb commented Sep 16, 2020

when we currently build the image in "Build Image" workflow we are using "scripts/ci" from master

I haven't yet had a chance to look in to this new Build Image workflow, but is it not possible to have this run as a second workflow file on PRs, rather than running it against master?

@ashb
Copy link
Member

ashb commented Sep 16, 2020

This is not a big issue IMHO now and I'd argue that it's better if we run original sources

More predictable, perhaps, but less useful overall: By running against the result of merging master and the PR, the tests effectively test "will this work when merged to master" which is more important than just "doe the tests pass on the branch".

By only testing the PR directly we're more likely to have broken master builds that we don't notice straight away (I don't know about anyone else, but when I merge a PR I don't even think about checking the build status on master 20-40mins later when it runs.), so changing to only test PR, not the merge result, would lead to more work needing to track down when the build started failing.

@potiuk
Copy link
Member Author

potiuk commented Sep 16, 2020

Yeah - agree it is less useful. Looking at the number of issues it caused - they are not very big, but looking at how difficult it is to diagnose them and explain them to users, I am also for doing what GitHub is doing by default (i.e. merging). I will take a look at how I can implement it in an easiest and robust way.

potiuk added a commit to potiuk/airflow that referenced this issue Oct 4, 2020
The PR builds are now better handled.

First of all we are using merged commit from the PR, not the original commit
from the PR.

Secondly - the workflow run notifies the original PR with comment
stating that the image is being built in a separate workflow -
including the link to that workflow.

Thirdly - when canceling duplicate PRs or PRs with failed
jobs, the workflow will add a comment to the PR stating the
reason why the PR is being cancelled.

Last but not least, we also add cancel job for the CodeQL duplicate
messages. They run for ~ 12 miinutes so it makes perfect sense to
also cancel those CodeQL jobs for which someone pushed fixups in a
quick succession.

Fixes: apache#10471
potiuk added a commit to potiuk/airflow that referenced this issue Oct 4, 2020
The PR builds are now better handled with regards to both
running (using merge-request) and canceling (with cancel notifications).

First of all we are using merged commit from the PR, not the original commit
from the PR.

Secondly - the workflow run notifies the original PR with comment
stating that the image is being built in a separate workflow -
including the link to that workflow.

Thirdly - when canceling duplicate PRs or PRs with failed
jobs, the workflow will add a comment to the PR stating the
reason why the PR is being cancelled.

Last but not least, we also add cancel job for the CodeQL duplicate
messages. They run for ~ 12 miinutes so it makes perfect sense to
also cancel those CodeQL jobs for which someone pushed fixups in a
quick succession.

Fixes: apache#10471
potiuk added a commit to potiuk/airflow that referenced this issue Oct 4, 2020
The PR builds are now better handled with regards to both
running (using merge-request) and canceling (with cancel notifications).

First of all we are using merged commit from the PR, not the original commit
from the PR.

Secondly - the workflow run notifies the original PR with comment
stating that the image is being built in a separate workflow -
including the link to that workflow.

Thirdly - when canceling duplicate PRs or PRs with failed
jobs, the workflow will add a comment to the PR stating the
reason why the PR is being cancelled.

Last but not least, we also add cancel job for the CodeQL duplicate
messages. They run for ~ 12 miinutes so it makes perfect sense to
also cancel those CodeQL jobs for which someone pushed fixups in a
quick succession.

Fixes: apache#10471
potiuk added a commit to potiuk/airflow that referenced this issue Oct 4, 2020
The PR builds are now better handled with regards to both
running (using merge-request) and canceling (with cancel notifications).

First of all we are using merged commit from the PR, not the original commit
from the PR.

Secondly - the workflow run notifies the original PR with comment
stating that the image is being built in a separate workflow -
including the link to that workflow.

Thirdly - when canceling duplicate PRs or PRs with failed
jobs, the workflow will add a comment to the PR stating the
reason why the PR is being cancelled.

Last but not least, we also add cancel job for the CodeQL duplicate
messages. They run for ~ 12 miinutes so it makes perfect sense to
also cancel those CodeQL jobs for which someone pushed fixups in a
quick succession.

Fixes: apache#10471
potiuk added a commit to PolideaInternal/airflow that referenced this issue Oct 4, 2020
The PR builds are now better handled with regards to both
running (using merge-request) and canceling (with cancel notifications).

First of all we are using merged commit from the PR, not the original commit
from the PR.

Secondly - the workflow run notifies the original PR with comment
stating that the image is being built in a separate workflow -
including the link to that workflow.

Thirdly - when canceling duplicate PRs or PRs with failed
jobs, the workflow will add a comment to the PR stating the
reason why the PR is being cancelled.

Last but not least, we also add cancel job for the CodeQL duplicate
messages. They run for ~ 12 miinutes so it makes perfect sense to
also cancel those CodeQL jobs for which someone pushed fixups in a
quick succession.

Fixes: apache#10471
potiuk added a commit that referenced this issue Oct 4, 2020
The PR builds are now better handled with regards to both
running (using merge-request) and canceling (with cancel notifications).

First of all we are using merged commit from the PR, not the original commit
from the PR.

Secondly - the workflow run notifies the original PR with comment
stating that the image is being built in a separate workflow -
including the link to that workflow.

Thirdly - when canceling duplicate PRs or PRs with failed
jobs, the workflow will add a comment to the PR stating the
reason why the PR is being cancelled.

Last but not least, we also add cancel job for the CodeQL duplicate
messages. They run for ~ 12 miinutes so it makes perfect sense to
also cancel those CodeQL jobs for which someone pushed fixups in a
quick succession.

Fixes: #10471
potiuk added a commit that referenced this issue Oct 6, 2020
The PR builds are now better handled with regards to both
running (using merge-request) and canceling (with cancel notifications).

First of all we are using merged commit from the PR, not the original commit
from the PR.

Secondly - the workflow run notifies the original PR with comment
stating that the image is being built in a separate workflow -
including the link to that workflow.

Thirdly - when canceling duplicate PRs or PRs with failed
jobs, the workflow will add a comment to the PR stating the
reason why the PR is being cancelled.

Last but not least, we also add cancel job for the CodeQL duplicate
messages. They run for ~ 12 miinutes so it makes perfect sense to
also cancel those CodeQL jobs for which someone pushed fixups in a
quick succession.

Fixes: #10471
(cherry picked from commit a4478f5)
RaviTezu pushed a commit to RaviTezu/airflow that referenced this issue Oct 25, 2020
We had to enable mounting from sources for a short while
because we had to find a way to add new scripts to the
"workflow_run" workflow we have. This also requires
the apache#10470 to be merged - perf_kit to be moved to tests.utils because
it was in a separate directory and image without mounting sources
could not run the tests.

It also partially addresses the apache#10445 problem where
there was difference between sources in the image and coming
from the master. This comes from GitHub running merge on
non-conflicting changes in the PR and something that will
be addressed shortly.

The issue apache#10471 discusses this in detail.

(cherry picked from commit 4fa7df5)
RaviTezu pushed a commit to RaviTezu/airflow that referenced this issue Oct 25, 2020
…1268)

The PR builds are now better handled with regards to both
running (using merge-request) and canceling (with cancel notifications).

First of all we are using merged commit from the PR, not the original commit
from the PR.

Secondly - the workflow run notifies the original PR with comment
stating that the image is being built in a separate workflow -
including the link to that workflow.

Thirdly - when canceling duplicate PRs or PRs with failed
jobs, the workflow will add a comment to the PR stating the
reason why the PR is being cancelled.

Last but not least, we also add cancel job for the CodeQL duplicate
messages. They run for ~ 12 miinutes so it makes perfect sense to
also cancel those CodeQL jobs for which someone pushed fixups in a
quick succession.

Fixes: apache#10471
(cherry picked from commit a4478f5)
kaxil pushed a commit that referenced this issue Nov 12, 2020
We had to enable mounting from sources for a short while
because we had to find a way to add new scripts to the
"workflow_run" workflow we have. This also requires
the #10470 to be merged - perf_kit to be moved to tests.utils because
it was in a separate directory and image without mounting sources
could not run the tests.

It also partially addresses the #10445 problem where
there was difference between sources in the image and coming
from the master. This comes from GitHub running merge on
non-conflicting changes in the PR and something that will
be addressed shortly.

The issue #10471 discusses this in detail.

(cherry picked from commit 4fa7df5)
kaxil pushed a commit that referenced this issue Nov 12, 2020
The PR builds are now better handled with regards to both
running (using merge-request) and canceling (with cancel notifications).

First of all we are using merged commit from the PR, not the original commit
from the PR.

Secondly - the workflow run notifies the original PR with comment
stating that the image is being built in a separate workflow -
including the link to that workflow.

Thirdly - when canceling duplicate PRs or PRs with failed
jobs, the workflow will add a comment to the PR stating the
reason why the PR is being cancelled.

Last but not least, we also add cancel job for the CodeQL duplicate
messages. They run for ~ 12 miinutes so it makes perfect sense to
also cancel those CodeQL jobs for which someone pushed fixups in a
quick succession.

Fixes: #10471
(cherry picked from commit a4478f5)
potiuk added a commit that referenced this issue Nov 16, 2020
We had to enable mounting from sources for a short while
because we had to find a way to add new scripts to the
"workflow_run" workflow we have. This also requires
the #10470 to be merged - perf_kit to be moved to tests.utils because
it was in a separate directory and image without mounting sources
could not run the tests.

It also partially addresses the #10445 problem where
there was difference between sources in the image and coming
from the master. This comes from GitHub running merge on
non-conflicting changes in the PR and something that will
be addressed shortly.

The issue #10471 discusses this in detail.

(cherry picked from commit 4fa7df5)
potiuk added a commit that referenced this issue Nov 16, 2020
The PR builds are now better handled with regards to both
running (using merge-request) and canceling (with cancel notifications).

First of all we are using merged commit from the PR, not the original commit
from the PR.

Secondly - the workflow run notifies the original PR with comment
stating that the image is being built in a separate workflow -
including the link to that workflow.

Thirdly - when canceling duplicate PRs or PRs with failed
jobs, the workflow will add a comment to the PR stating the
reason why the PR is being cancelled.

Last but not least, we also add cancel job for the CodeQL duplicate
messages. They run for ~ 12 miinutes so it makes perfect sense to
also cancel those CodeQL jobs for which someone pushed fixups in a
quick succession.

Fixes: #10471
(cherry picked from commit a4478f5)
kaxil pushed a commit that referenced this issue Nov 18, 2020
The PR builds are now better handled with regards to both
running (using merge-request) and canceling (with cancel notifications).

First of all we are using merged commit from the PR, not the original commit
from the PR.

Secondly - the workflow run notifies the original PR with comment
stating that the image is being built in a separate workflow -
including the link to that workflow.

Thirdly - when canceling duplicate PRs or PRs with failed
jobs, the workflow will add a comment to the PR stating the
reason why the PR is being cancelled.

Last but not least, we also add cancel job for the CodeQL duplicate
messages. They run for ~ 12 miinutes so it makes perfect sense to
also cancel those CodeQL jobs for which someone pushed fixups in a
quick succession.

Fixes: #10471
(cherry picked from commit a4478f5)
cfei18 pushed a commit to cfei18/incubator-airflow that referenced this issue Mar 5, 2021
We had to enable mounting from sources for a short while
because we had to find a way to add new scripts to the
"workflow_run" workflow we have. This also requires
the apache#10470 to be merged - perf_kit to be moved to tests.utils because
it was in a separate directory and image without mounting sources
could not run the tests.

It also partially addresses the apache#10445 problem where
there was difference between sources in the image and coming
from the master. This comes from GitHub running merge on
non-conflicting changes in the PR and something that will
be addressed shortly.

The issue apache#10471 discusses this in detail.

(cherry picked from commit 4fa7df5)
cfei18 pushed a commit to cfei18/incubator-airflow that referenced this issue Mar 5, 2021
…1268)

The PR builds are now better handled with regards to both
running (using merge-request) and canceling (with cancel notifications).

First of all we are using merged commit from the PR, not the original commit
from the PR.

Secondly - the workflow run notifies the original PR with comment
stating that the image is being built in a separate workflow -
including the link to that workflow.

Thirdly - when canceling duplicate PRs or PRs with failed
jobs, the workflow will add a comment to the PR stating the
reason why the PR is being cancelled.

Last but not least, we also add cancel job for the CodeQL duplicate
messages. They run for ~ 12 miinutes so it makes perfect sense to
also cancel those CodeQL jobs for which someone pushed fixups in a
quick succession.

Fixes: apache#10471
(cherry picked from commit a4478f5)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area:CI Airflow's tests and continious integration kind:bug This is a clearly a bug
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants