Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use a trimmed version of README.md for PyPI #33637

Merged
merged 13 commits into from
Sep 1, 2023

Conversation

pankajkoti
Copy link
Member

closes: #33505


^ Add meaningful description above
Read the Pull Request Guidelines for more information.
In case of fundamental code changes, an Airflow Improvement Proposal (AIP) is needed.
In case of a new dependency, check compliance with the ASF 3rd Party License Policy.
In case of backwards incompatible changes please leave a note in a newsfragment file, named {pr_number}.significant.rst or {issue_number}.significant.rst, in newsfragments.

@potiuk
Copy link
Member

potiuk commented Aug 23, 2023

I think we do NOT want to maintain yet another README. My vision of is that the README will be automatically extracted from the main README by pre-commit (so ideally it should be placed in generated dir and used from there,

We could do it dynamically as well, but that would not fly with mostly declarative way how we want to build packages. So automatically generated file which is just truncated version of README.md commited in generated seems a much better approach.

@pankajkoti
Copy link
Member Author

I think we do NOT want to maintain yet another README. My vision of is that the README will be automatically extracted from the main README by pre-commit (so ideally it should be placed in generated dir and used from there,

We could do it dynamically as well, but that would not fly with mostly declarative way how we want to build packages. So automatically generated file which is just truncated version of README.md commited in generated seems a much better approach.

yes, good idea, will rather work on the extraction via pre-commit.

@pankajkoti pankajkoti marked this pull request as draft August 30, 2023 06:51
Copy link
Member

@potiuk potiuk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

love it

@potiuk potiuk added this to the Airflow 2.7.1 milestone Aug 30, 2023
@potiuk
Copy link
Member

potiuk commented Aug 30, 2023

Really tiny NIT. in generated/README.md I explained how various files in that folder are generated. Would be great to add a paragrapgh descreiption on where PYPIREADME is from. When you look at that generated file, it's not obvious which commit generates it and how, so adding a short explanation will aid future contributors (and even future selves) to know where the hell that fille is from

@potiuk
Copy link
Member

potiuk commented Aug 30, 2023

Super - Super NIT. It would also be great if the first line (or split in two lines) of the generated PYPIREADME.md contains something like that

<!-- PLEASE DO NOT MODIFY THIS FILE. IT HAS BEEN GENERATED AUTOMATICALY FROM THE `README.md` IN MAIN AIRFLOW REPO BY `....` PRE-COMMIT. YOUR CHANGES HERE WILL BE AUTOMATICALLY OVERWRITTEN -->  

You can also safely remove the Licence header when you generate it (though this might be a bit complex so we can leave it). The ASF requirement is that licence headers are present in files that are human-prepared, it does not have to be added to files which are automatically generated from other files.

@potiuk
Copy link
Member

potiuk commented Aug 30, 2023

BTW. I REALLY wanted to add such comment in the .json files - but unfortunately, json specification does not allow comments (hence the README.md).

@@ -22,7 +22,7 @@ author = Apache Software Foundation
author_email = [email protected]
url = https://airflow.apache.org/
version = attr: airflow.__version__
long_description = file: README.md
long_description = file: generated/PYPI_README.md
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The generated folder is included in the MANIFEST.in https://github.com/apache/airflow/blob/main/MANIFEST.in#L42

So I am guessing this file will be available correctly and be used for the PyPI project listing description?
If someone can confirm my guess, would be great.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can build the package with breeze release-management prepare-airflow-package --package-format sdist and extract it to check if it’s in the package.

Copy link
Member

@potiuk potiuk Aug 30, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes. And use "both" to generate both packages and you can also check if it is in METADATA of the .whl package

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you.

Yes, confirmed that the file is included both in the .tar.gz and in the wheel's pkg description

Screenshot 2023-08-30 at 2 44 28 PM

@pankajkoti pankajkoti marked this pull request as ready for review August 30, 2023 08:01
Copy link
Member

@potiuk potiuk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great now.

@pankajkoti
Copy link
Member Author

Static check is failing with the below error:

Check if licenses are OK for Apache................................................Failed
- hook id: check-apache-license-rat
- exit code: 1

Running command:
docker run -v /home/runner/work/airflow/airflow:/opt/airflow -t --user 1001:999 --rm --platform linux/amd64 ghcr.io/apache/airflow-apache-rat:0.13-2021.07.04 -d /opt/airflow --exclude-file /opt/airflow/.rat-excludes

ERROR: Could not find Apache licences in some files:

 !????? /opt/airflow/generated/PYPI_README.md

I tried adding generated/PYPI_README.md (this is already committed and fails in this PR checks), ./generated/PYPI_README.md, /opt/airflow/generated/PYPI_README.md to the .rat-excludes file, also tried to rebuild the image, but the error persists when I run breeze static-checks --type check-apache-license-rat --all-files --force-build locally.

Am I missing something obvious or there's more to do 🤔

setup.cfg Outdated Show resolved Hide resolved
Comment on lines 68 to 84
if not pypi_readme_file.exists():
pypi_readme_content = ""
else:
pypi_readme_content = pypi_readme_file.read_text()
with readme_file.open("r") as readme:
readme_content = readme.read()
generated_pypi_readme_content = PYPI_README_HEADER
for section in README_SECTIONS_TO_EXTRACT:
section_content = extract_section(readme_content, section)
generated_pypi_readme_content += section_content
if pypi_readme_content != generated_pypi_readme_content:
with pypi_readme_file.open("w") as generated_readme:
generated_readme.write(generated_pypi_readme_content)
console.print("\n[yellow]PyPI README.md content regenerated after update..\n")
console.print("[red]Aborting the commit")
console.print("You should check the changes made. Then simply 'git add --update .' and re-commit")
sys.exit(1)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about we just write the file every time? That simplifies this code a bit, and still shows a good enough message to the user:

Generate PyPI README...............................................................Failed
- hook id: generate-pypi-readme
- files were modified by this hook
Suggested change
if not pypi_readme_file.exists():
pypi_readme_content = ""
else:
pypi_readme_content = pypi_readme_file.read_text()
with readme_file.open("r") as readme:
readme_content = readme.read()
generated_pypi_readme_content = PYPI_README_HEADER
for section in README_SECTIONS_TO_EXTRACT:
section_content = extract_section(readme_content, section)
generated_pypi_readme_content += section_content
if pypi_readme_content != generated_pypi_readme_content:
with pypi_readme_file.open("w") as generated_readme:
generated_readme.write(generated_pypi_readme_content)
console.print("\n[yellow]PyPI README.md content regenerated after update..\n")
console.print("[red]Aborting the commit")
console.print("You should check the changes made. Then simply 'git add --update .' and re-commit")
sys.exit(1)
readme_content = readme_file.read_text()
generated_pypi_readme_content = PYPI_README_HEADER
for section in README_SECTIONS_TO_EXTRACT:
section_content = extract_section(readme_content, section)
generated_pypi_readme_content += section_content
with pypi_readme_file.open("w") as generated_readme:
generated_readme.write(generated_pypi_readme_content)

(If not, I have a number of comments on the original code, so lmk 👍)

Copy link
Member Author

@pankajkoti pankajkoti Aug 30, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, this works very well. Very clean, thank you :)

Yes, I had all the ugly code earlier and the rich dependency for the console output & failing the commit with sys.exit which is no longer needed 🚀

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah. Writing file every time is fine. Now. The only thing that's left is just to ... run the pre-commit to fix some discrepancies :)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am still seeing static failures wrt #33637 (comment). Any suggestion for me that what we could do here?

@potiuk
Copy link
Member

potiuk commented Aug 31, 2023

Am I missing something obvious or there's more to do 🤔

Yep. I think you found the fix in the last commit :)

@pankajkoti
Copy link
Member Author

pankajkoti commented Aug 31, 2023

Yep. I think you found the fix in the last commit :)

I am not sure. I did add the file to .rat-excludes, but it still fails, waiting for another run in the CI now, but I'm not hopeful :(

.rat-excludes Outdated Show resolved Hide resolved
@alexbegg
Copy link
Contributor

BTW. I REALLY wanted to add such comment in the .json files - but unfortunately, json specification does not allow comments (hence the README.md).

@potiuk maybe it is it possible to switch these files to .jsonc, "JSON with Comments"? VSCode supports it for example (https://code.visualstudio.com/docs/languages/json#_json-with-comments), I am not sure if it is possible to easily include support in this project.

@potiuk
Copy link
Member

potiuk commented Aug 31, 2023

BTW. I REALLY wanted to add such comment in the .json files - but unfortunately, json specification does not allow comments (hence the README.md).

@potiuk maybe it is it possible to switch these files to .jsonc, "JSON with Comments"? VSCode supports it for example (https://code.visualstudio.com/docs/languages/json#_json-with-comments), I am not sure if it is possible to easily include support in this project.

Well. We need to read them in setup.py using nothing by stdlib of python in all versions we support. Otherwise I'd just use YAML. The file is read by python setup.py that should work without being in any specifc venv, no libraries installed, no IDEs or anything. Nothing beats simplicity of import json; provider_dependencies_dict =json.loads( Path(__name__.parent) / "generated" /"provider_dependencies.json").text)

@Taragolis
Copy link
Contributor

Well. We need to read them in setup.py using nothing by stdlib of python in all versions we support. Otherwise I'd just use YAML.

We just need wait a bit (3 years 😀 ) for min Python 3.11 and we could use tomllib

@pankajkoti
Copy link
Member Author

I added the license back to the generated file as the the exclude is not working.
Static checks seems to run fine now.

One failing test of MSSQL2017-latest, Py3.8: Core Providers[-amazon,google] Other Providers[amazon] WWW API Always CLI Providers[google] seems to be unrelated.

@phanikumv
Copy link
Contributor

I added the license back to the generated file as the the exclude is not working. Static checks seems to run fine now.

One failing test of MSSQL2017-latest, Py3.8: Core Providers[-amazon,google] Other Providers[amazon] WWW API Always CLI Providers[google] seems to be unrelated.

just re-triggered the failed test, lets see if its a flaky one

@potiuk potiuk merged commit 88a7a70 into apache:main Sep 1, 2023
64 checks passed
@ephraimbuddy ephraimbuddy deleted the trimmed-pypi-readme branch September 1, 2023 14:49
@ephraimbuddy ephraimbuddy added the type:misc/internal Changelog: Misc changes that should appear in change log label Sep 1, 2023
ephraimbuddy pushed a commit that referenced this pull request Sep 1, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area:dev-tools type:misc/internal Changelog: Misc changes that should appear in change log
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Trim down the PyPI readme
8 participants