Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[AIRFLOW-6195] Fixed TaskInstance attrs not correct on UI #6758

Merged
merged 13 commits into from
Dec 11, 2019
Merged

[AIRFLOW-6195] Fixed TaskInstance attrs not correct on UI #6758

merged 13 commits into from
Dec 11, 2019

Conversation

baolsen
Copy link
Contributor

@baolsen baolsen commented Dec 9, 2019

Make sure you have checked all steps below.

Jira

  • My PR addresses the following Airflow Jira issues and references them in the PR title. For example, "[AIRFLOW-XXX] My Airflow PR"
    • https://issues.apache.org/jira/browse/AIRFLOW-XXX
    • In case you are fixing a typo in the documentation you can prepend your commit with [AIRFLOW-XXX], code changes always need a Jira issue.
    • In case you are proposing a fundamental code change, you need to create an Airflow Improvement Proposal (AIP).
    • In case you are adding a dependency, check if the license complies with the ASF 3rd Party License Policy.

Description

  • Here are some details about my PR, including screenshots of any UI changes:

Fixed calculation of queued_dttm - was using the first/previous value from prior attempts, and not updating when tasks are re-queued or retried for some other reason. This is unexpected behavior, and although it was implemented specifically in that way it is not referenced elsewhere in the application so it seems this behavior is actually not needed. (queued_dttm is only referenced in the UI under Task Instance details - but even then it was not being displayed at all anyway due to below bug). Changed it to always be updated when a task is queued / re-queued which is more intuitive.

Fixed various fields not displaying correctly on the UI for "Task Instance Details".
Before (just displays None instead of value from DB):
image

After (displays value from DB):
image

Tests

  • My PR adds the following unit tests OR does not need testing for this extremely good reason:

Tested manually by looking at the UI and comparing to the DB backend

Commits

  • My commits all reference Jira issues in their subject lines, and I have squashed multiple commits if they address the same issue. In addition, my commits follow the guidelines from "How to write a good git commit message":
    1. Subject is separated from body by a blank line
    2. Subject is limited to 50 characters (not including Jira issue reference)
    3. Subject does not end with a period
    4. Subject uses the imperative mood ("add", not "adding")
    5. Body wraps at 72 characters
    6. Body explains "what" and "why", not "how"

Documentation

  • In case of new functionality, my PR adds documentation that describes how to use it.
    • All the public functions and the classes in the PR contain docstrings that explain what it does
    • If you implement backwards incompatible changes, please leave a note in the Updating.md so we can assign it to a appropriate release

@potiuk
Copy link
Member

potiuk commented Dec 9, 2019

Thanks @baolsen! It looks reasonable, however I have not much experience with those parts of the UI so maybe others might want to take a look (@ashb @feluelle - I think you are a bit more familiar with it ;).

BTW. @baolsen answering your question from slack -> you cannot add reviewers if you are not maintainer/committer. But you can still add people in comments :).

@potiuk potiuk added the area:UI label Dec 9, 2019
@baolsen
Copy link
Contributor Author

baolsen commented Dec 9, 2019

Thanks @potiuk . FYI my build is failing due to a flaky test (since it ran fine in my repo), if you could rerun for me that would be great.

@potiuk
Copy link
Member

potiuk commented Dec 9, 2019

Restarted!

@baolsen
Copy link
Contributor Author

baolsen commented Dec 9, 2019

Ready to merge :)

@baolsen
Copy link
Contributor Author

baolsen commented Dec 9, 2019

Edit: Ready to merge after @ashb @feluelle have had a look :)

Copy link
Member

@ashb ashb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Further down in scheduler_job in _change_state_for_tasks_failed_to_execute we set TIs that were Queued but that didn't get picked up by the executor back to Scheduled -- I think it probably makes sense to set the queued_dttm to None there too.

Copy link
Member

@ashb ashb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good otherwise

@potiuk
Copy link
Member

potiuk commented Dec 10, 2019

Side comment @baolsen

FYI. We usually use rebase workflow during review rather than merge (https://www.atlassian.com/git/tutorials/merging-vs-rebasing).

While this is not as intuitive initially as merge (if you are used to it), it is pretty well supported by Github (you can see snippets of old commits even if they are gone). This means that you have to often rebase on-top of master rather than have merge commits and get used to push --force for your own branch :).

This has great benefit of being able to keep even multiple commits separated in ine PR and continuously rebased on top of latest master - which (if you do it often and change is small) is rather easy. Thenn when we review the code we can also review them commit-by-commit even if eventually they will get squashed as single commit. Having merge commits in between your commits makes it rather difficult to review it commit-by-commit - you have no option but review all of it.

Also if you add git commit --fixup + git rebase -i --autosquash to the whole toolset you get some powerful workflow that allows you to keep your work logically separated into several commits and work on several commits at the same time.

I switched to it few years ago and never looked back since.

@ashb
Copy link
Member

ashb commented Dec 10, 2019

@baolsen If you rebase (or merge, which ever you are most comfortable with) the static check error should be fixed - it was a bug in master. And ping me after it's done :)

@baolsen
Copy link
Contributor Author

baolsen commented Dec 11, 2019

Thanks guys, I have just recently learned about rebase, thanks to this project.
Here is the flow I am using currently, suggestions welcome :)

Once-off, set up my fork and configure upstream as follows
git remote -v
origin https://github.com/baolsen/airflow.git (fetch)
origin https://github.com/baolsen/airflow.git (push)
upstream https://github.com/apache/airflow.git (fetch)
upstream https://github.com/apache/airflow.git (push)

To keep my local master up to date:
git checkout master
git fetch upstream
git merge upstream/master master
Update my fork's master:
git push -f origin

To keep my feature branch up to date:
git checkout
git rebase -i master
(squash into 1 commit per commit guidelines)
git push -f origin

Here is where I think I'm going wrong.
I had thought that exactly 1 commit was needed.
So if I understand you correctly, the changes made because of code review should not be squashed into the first commit, i.e there should be 2+ commits as a result of the review process. 1 original, +1 for each review.
Then you (committers/maintainers) can squash them all again down to 1 commit when the PR gets merged.

Is that roughly correct?

For someone new to re-basing (and contributing in general) I did followe the contrib guide but it was a little unclear to me (also I once force pushed and lost changes to one of my first PRs :) . Anyway I was planning to clarify and add these detailed steps to the contrib guide to make it clearer for others, so really appreciate your feedback here :)

@codecov-io
Copy link

codecov-io commented Dec 11, 2019

Codecov Report

Merging #6758 into master will decrease coverage by 0.28%.
The diff coverage is 100%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master    #6758      +/-   ##
==========================================
- Coverage   84.53%   84.24%   -0.29%     
==========================================
  Files         672      672              
  Lines       38153    38162       +9     
==========================================
- Hits        32252    32150     -102     
- Misses       5901     6012     +111
Impacted Files Coverage Δ
airflow/jobs/scheduler_job.py 89.26% <100%> (+0.01%) ⬆️
airflow/jobs/backfill_job.py 91.88% <100%> (ø) ⬆️
airflow/models/taskinstance.py 94.01% <100%> (+0.07%) ⬆️
airflow/kubernetes/volume_mount.py 44.44% <0%> (-55.56%) ⬇️
airflow/kubernetes/volume.py 52.94% <0%> (-47.06%) ⬇️
airflow/kubernetes/pod_launcher.py 45.25% <0%> (-46.72%) ⬇️
airflow/kubernetes/refresh_config.py 50.98% <0%> (-23.53%) ⬇️
...rflow/contrib/operators/kubernetes_pod_operator.py 78.2% <0%> (-20.52%) ⬇️
airflow/contrib/operators/ssh_operator.py 83.33% <0%> (-1.29%) ⬇️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 53aa975...47d55f0. Read the comment docs.

@potiuk
Copy link
Member

potiuk commented Dec 11, 2019

Some comments (this is the workflow I use).

To keep my local master up to date:
git checkout master
git fetch upstream
git merge upstream/master master

I very rarely keep local master sync-only when I need to create a new branch locally. If you need it, it should be enough to just do git pull --ff-only. We (almost) never push --force to master. So any pull should be fast-forward only. If it is not, the --ff-only should cause it to fail (but this almost never happen - I did it once and regret it).

Update my fork's master:
git push -f origin
To keep my feature branch up to date:
git checkout
git rebase -i master
(squash into 1 commit per commit guidelines)
git push -f origin

What I usually do to keep my feature branch is I rebase 'onto' the upstream/master branch. It needs a bit more consciousness to know where your commit started. I look at git log and see the last commit before those that I want to rebase and I do git rebase <hash_of_the_first_commit_I_do_not_want_to_rebase> --onto upstream/master. It's a little more involved as you need to look up where you branched-off but it's usually pretty obvious.

Here is where I think I'm going wrong.
I had thought that exactly 1 commit was needed.

During the review 1 commit is not needed - especially if you work incrementally and add fixups after review. This makes it easier to review as you apply fixes. So what I usually do is:
git commit --fixup HEAD or git commit --fixup <hash_of_the_commit_I_want_to_update>. Then at some point of time it is enough to do git rebase -j <hash-of-some-older-commit> --autosquash and those fixups will be automatically joined with the commits they were fixing. But for review purpose I keep those fixup for some time so that reviewers can look only at the changes I made as result of the review.

So if I understand you correctly, the changes made because of code review should not be squashed into the first commit, i.e there should be 2+ commits as a result of the review process. 1 original, +1 for each review.
Then you (committers/maintainers) can squash them all again down to 1 commit when the PR gets merged.

Is that roughly correct?

Correct. Or even better you can rebase + squash just before merge - which is a good practice as there might be for example new pre-commit checks added independently.

For someone new to re-basing (and contributing in general) I did followed the contrib guide but it was a little unclear to me (also I once force pushed and lost changes to one of my first PRs :) . Anyway I was planning to clarify and add these detailed steps to the contrib guide to make it clearer for others, so really appreciate your feedback here :)

I think we do not have a "standard" workflow - we leave people quite a bit of freedom how they work. But maybe we should. Mine is just an example and is a bit involved and 'geeky'. Regarding losing changes - it's really hard to lose anything in git. Even if you force-push there is always git reflog to the rescue and you can checkout any commit (by hash) you had in your local .git until it's garbage collected (which by default is 30 days or so). Learning reflog taught me to be a little more courageous when playing with git.

@ashb ashb merged commit d4a8afb into apache:master Dec 11, 2019
kaxil pushed a commit that referenced this pull request Dec 17, 2019
kaxil pushed a commit that referenced this pull request Dec 18, 2019
ashb pushed a commit that referenced this pull request Dec 19, 2019
@mik-laj mik-laj added area:webserver Webserver related Issues and removed area:UI labels Jan 6, 2020
galuszkak pushed a commit to FlyrInc/apache-airflow that referenced this pull request Mar 5, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area:webserver Webserver related Issues
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants