Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

BugFix: fix DAG doc display (especially for TaskFlow DAGs) #14564

Merged
merged 3 commits into from
Mar 2, 2021

Conversation

XD-DENG
Copy link
Member

@XD-DENG XD-DENG commented Mar 2, 2021

Issue and Background

Let's take the example DAG tutorial_taskflow_api_etl as example. In 2.0.1, the DAG doc in Markdown format is not rendered properly.

tutorial_taskflow_api_etl_-Tree-_Airflow

If it's rendered properly, it should look like the DAG Docs of another example DAG, tutorial,

tutorial_-Tree-_Airflow

Why

Because of how TaskFlow DAGs are constructed, their __doc__ lines may start with spaces. This fails markdown.markdown(), and the doc in Markdown format cannot be transformed into HTML properly, and further fails the doc display in the UI.

Actually this also affects the non-TaskFlow DAGs (if users accidentally add a space in the beginning of any line in the __doc__)

def x(x):
    """
    ### Header
    XD's test
    2nd line
    """
    return x * 2

import markdown

print('Case-1')
print(markdown.markdown(x.__doc__))

print('Case-2')
print(markdown.markdown('\n'.join(line.lstrip() for line in x.__doc__.split('\n'))))

Output:

Case-1
<pre><code>### Header
XD's test
2nd line
</code></pre>

Case-2
<h3>Header</h3>
<p>XD's test
2nd line</p>

Solution

This commit fixes this by always doing left strip for each line of the doc md.


^ Add meaningful description above

Read the Pull Request Guidelines for more information.
In case of fundamental code change, Airflow Improvement Proposal (AIP) is needed.
In case of a new dependency, check compliance with the ASF 3rd Party License Policy.
In case of backwards incompatible changes please leave a note in UPDATING.md.

Because of how TaskFlow DAGs are constructed, their __doc__ lines
may start with spaces. This fails markdown.markdown(), and the
doc in Markdown format cannot be transformed into HTML properly,
and further fails the doc display in the UI.

This commit fixes this by always doing left strip for each line for the doc md.
@XD-DENG XD-DENG added area:webserver Webserver related Issues type:bug-fix Changelog: Bug Fixes labels Mar 2, 2021
@XD-DENG XD-DENG added this to the Airflow 2.0.2 milestone Mar 2, 2021
@github-actions
Copy link

github-actions bot commented Mar 2, 2021

The Workflow run is cancelling this PR. Building images for the PR has failed. Follow the the workflow link to check the reason.

@kaxil kaxil merged commit 22e3a4c into apache:master Mar 2, 2021
@XD-DENG XD-DENG deleted the fix-doc-md-display branch March 3, 2021 08:11
ashb pushed a commit that referenced this pull request Mar 19, 2021
Because of how TaskFlow DAGs are constructed, their __doc__ lines
may start with spaces. This fails markdown.markdown(), and the
doc in Markdown format cannot be transformed into HTML properly,
and further fails the doc display in the UI.

This commit fixes this by always doing left strip for each line for the doc md.

(cherry picked from commit 22e3a4c)
@Richiecakes
Copy link

I believe this commit prevents the use of e.g nested lists in markdown docs where indentation is important. Does whitespace/tabs need to be stripped from every line of all markdown docstrings?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area:webserver Webserver related Issues type:bug-fix Changelog: Bug Fixes
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants