-
-
Notifications
You must be signed in to change notification settings - Fork 481
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Developer Guide: Document practices for data files #37399
Developer Guide: Document practices for data files #37399
Conversation
- Large data files should not be added to the Sage source tree. | ||
Instead: | ||
|
||
- Create a separate git repository for them | ||
- Add metadata in your repository that make it a pip-installable | ||
package (distribution package) | ||
- Upload it to PyPI | ||
- Create metadata in ``SAGE_ROOT/build/pkgs`` that make your new | ||
pip-installable package known to Sage | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice. Some ideas:
- give https://github.com/sagemath/conway-polynomials and https://github.com/gmou3/matroid-database as examples
- suggest compressing large data files with xz, read transparently as in https://github.com/sagemath/conway-polynomials/blob/master/src/conway_polynomials/__init__.py#L116
- suggest trying
xz -T 0 -e
to compress, sometimes gives big gains for highly repetitive text files (without affecting the cost for decompression).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks! Yes, linking to these projects as examples of best practices is a good idea. Will do (unless someone sends me a PR first... hint, hint)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@tornaria are these points taken care of by the current version of the text already?
For accessing resources in packages, link to https://docs.python.org/3/library/importlib.resources.html#importlib.resources.files. Note everything else is deprecated. Basic examples:
Note: Edit: I added explicit |
Notes:
|
This code looks nice to me:
It seems like we want the |
You are right; a different default from |
This is no longer recommended, and PRs that update such uses | ||
are welcome. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should not un-recommend this until sage.interfaces
is a package or we remove support for python 3.9. The alternative is ugly, and is adding code that will have to be replaced again soon.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
adding code that will have to be replaced again soon
why is this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
importlib_resources is only needed for namespace packages for python 3.9 so your try
hack is unecessary with python 3.10.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please. It's not a "hack". These are the best practices.
And by documenting this method to import it, we avoid the "too much information" problem where we tell developers every little detail of what is supported on what version etc.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please. It's not a "hack". These are the best practices.
According to whom?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK, maybe second-best practices
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Now (with abf3ae6), at best practices.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please give an external reference (i.e. not yourself)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hard to tell what you are looking for. https://importlib-resources.readthedocs.io/en/latest/using.html# leaves no doubt how to import from it. We have decided to conditionalize the dependency on the backport package in #36776 @orlitzky, hence here we conditionalize the import.
In case you are looking for this: You can see how it is used in projects in the wild in this code search:
https://github.com/search?q=%22from+importlib_resources+import%22&type=code&p=1
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
They make the type-checker (mypy) happy. |
try: | ||
# Use backport package providing Python 3.11 features | ||
from importlib_resources import files | ||
except ImportError: | ||
from importlib.resources import files |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
try: | |
# Use backport package providing Python 3.11 features | |
from importlib_resources import files | |
except ImportError: | |
from importlib.resources import files | |
from importlib.resources import files |
maybe with a note about not using resources in namespace packages until 3.9 is removed support, i.e. apr 2024?. Really not worth doing this for 2 months and using the version of importlib.resources from python is better to avoid bleeding edge and stick to long term stable API.
Your code allows one, in principle, to use features from new importlib_resources
which are not available in the version of python used. But if those features are indeed used, this results in an incompatibility with a system where python is new enough to have importlib.resources
but where importlib_resources
is not installed (or not up to date). From that point of view is much better to stick to the stable API, meaning features that are common available in all the supported versions of python.
The only newer nicety I see (available on 3.12) is that one can use files()
to stand for files(__package__)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The case of "importlib_resources
not up to date" is a real point. I'll make a change.
I don't agree with the rest what you're saying.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In abf3ae6, I've changed it to using an explicit version check (for 3.11!) instead of try... except
.
I'm using 3.11, not 3.9.
Note the section on "Language Standard" in the Developer's Guide: The idea is that we develop the Sage library to one language standard, and that's currently 3.9 + the backports for 3.11.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
until 3.9 is removed support, i.e. apr 2024?
We don't have a set schedule for this yet.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The only newer nicety I see (available on 3.12) is that one can use files() to stand for files(package)
That's quite nice, so let's please merge
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
until 3.9 is removed support, i.e. apr 2024?
We don't have a set schedule for this yet.
42 months as per NEP 29 is apr 2024. They call this best practices.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please @tornaria. Our project has not adopted NEP 29. If you want to know about that, read
https://github.com/sagemath/sage/wiki/NEP-29:-Python-version-strategy
if sys.version_info < (3, 11): | ||
# Use backport package providing Python 3.11 features | ||
from importlib_resources import files | ||
else: | ||
from importlib.resources import files |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if sys.version_info < (3, 11): | |
# Use backport package providing Python 3.11 features | |
from importlib_resources import files | |
else: | |
from importlib.resources import files | |
from importlib.resources import files |
I already gave rationale.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You already know that I disagree, and I have explained it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If you don't agree with the suggestion, feel free to label the PR as disputed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's not how it works.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have taken this part out now to regain velocity
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks.
1bbc922
to
8dcb34a
Compare
8dcb34a
to
6984347
Compare
Let's get this in please. |
I agree with @tornaria that it would be better to use |
But this document does not describe the use of importlib.resources in the standalone packages. |
Ok, but same answer for all the same reasons. |
Well, no. The modularization of the Sage library and its uses of namespace packages does not disappear just like that. |
The database should only be present in one distribution right? And if not, I think there's a decent consensus that we should chill for a while on the init file removal until namespace packages are well-supported upstream. |
No, Michael, there is no such consensus, and certainly not a "decent" one. That the Sage library uses namespace packages for the purpose of the modularization is documented policy in the developer's guide. |
For the separate distributions (conway-polynomials, matroid-database), using an ordinary package (with |
What we are trying to document here is to cover cases such as |
It's in the developer's guide because you put it there.
Handled by doing nothing and waiting for namespace package support upstream. Nobody needs this by tomorrow, and the maintenance to keep it working is a headache. |
None of this sounds like the beginning of an interesting discussion; rather it sounds like the end of a discussion. |
(to be used in place of ``importlib.metadata`` when Python older | ||
than 3.11 is in use), | ||
- `importlib_resources <../reference/spkg/importlib_resources>`_: | ||
If necessary, you can use it in place of ``importlib.resources`` | ||
when Python older than 3.11 is in use if you need a newer | ||
feature:: | ||
|
||
import sys | ||
|
||
if sys.version_info < (3, 11): | ||
# Use backport package providing Python 3.11 features | ||
from importlib_resources import files | ||
else: | ||
from importlib.resources import files | ||
|
||
- `typing_extensions <../reference/spkg/typing_extensions>`_ | ||
(to be used in place of ``typing``). | ||
(to be used in place of ``typing`` when any of the features | ||
introduced after Python 3.9.0 are used). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You forgot about this, other that this the PR looks good to me.
I would remove almost all of this section, starting from "Some key language and library functions..." to the end, but I understand if you don't want to do it here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I didn't forget about it. I actually changed the language so it does not unconditionally asks for it to be imported like that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please, let's not play games.
Two reviewers showed concern for this. You can have a different opinion, but if you accept to take it out "to regain velocity", then do it without tricks fully. I already stated my positive review conditional on this.
Edit: toned it down. I didn't intend to offend @mkoeppe at all. The underlying message is a constructive review to get the 95% of your PR that we all agree is a very nice improvement.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Gonzalo, this is uncalled for.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What is your concern about this improvement of the documentation in the "Language Standard" section?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not really familiar with the functionality discussed here. The only thing I can do is moderate the conflict (if my understanding of it is good enough). Could this be a compromise: we leave the example code-block as is and add a note that some distributions no longer ship importlib_resources
for good reasons. Ideally we should add some URL
pointers that address these good reasons.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not really familiar with the functionality discussed here. The only thing I can do is moderate the conflict (if my understanding of it is good enough). Could this be a compromise: we leave the example code-block as is and add a note that some distributions no longer ship
importlib_resources
for good reasons. Ideally we should add someURL
pointers that address these good reasons.
Sorry I missed this before.
It seems we don't agree whether to recommend or not using importlib_resources
. In this
context, I think merging this nice PR without this small snippet is a good way to go, and I thought that's what @mkoeppe was proposing. That's why I gave it positive review conditional on this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll split out the improvement of this section of the manual (orthogonal to the goal of this PR) out to a separate PR and will take @soehms's suggestion into account there.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done now in the rebased branch.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
... and in #37654.
location as the Python code. This is referred to as "package data". | ||
|
||
The preferred way to access the data from Python is using the | ||
`importlib.resources API <https://importlib-resources.readthedocs.io/en/latest/using.html>`, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This line doesn't look rendered correctly when I follow the documentation link.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed in 6045e5c, I hope
f9955ca
to
3cfab54
Compare
…e added to the Sage source tree
…s_file, add links to tutorial
…amples Co-authored-by: Gonzalo Tornaría <[email protected]>
…te direct access to data files using __file__ for now
…GE_EXTCODE deprecation
…t from importlib.resources, fix markup
6045e5c
to
d3a4313
Compare
Let's get this in please. |
Documentation preview for this PR (built with commit d3a4313; changes) is ready! 🎉 |
Thanks. |
…kages <!-- ^ Please provide a concise and informative title. --> <!-- ^ Don't put issue numbers in the title, do this in the PR description below. --> <!-- ^ For example, instead of "Fixes sagemath#12345" use "Introduce new method to calculate 1 + 2". --> <!-- v Describe your changes below in detail. --> <!-- v Why is this change required? What problem does it solve? --> <!-- v If this PR resolves an open issue, please link to it here. For example, "Fixes sagemath#12345". --> - sagemath#35203 added these packages as unconditional dependencies - prompted by a sporadic use of typing_extensions in the Sage library (see sagemath#34831) - motivated further by the demand to immediately drop support for Python 3.8 so that newer typing features can be used in the Sage library (sagemath#35404) - sagemath#36776 reduced the packages to conditional dependencies Here we improve the documentation in the section "Language standard" of the developer's guide so that it aligns with how the conditional dependencies are declared. Based on changes split out from sagemath#37399. ### 📝 Checklist <!-- Put an `x` in all the boxes that apply. --> - [ ] The title is concise and informative. - [ ] The description explains in detail what this PR is about. - [ ] I have linked a relevant issue or discussion. - [ ] I have created tests covering the changes. - [ ] I have updated the documentation accordingly. ### ⌛ Dependencies <!-- List all open PRs that this PR logically depends on. For example, --> <!-- - sagemath#12345: short description why this is a dependency --> <!-- - sagemath#34567: ... --> URL: sagemath#37654 Reported by: Matthias Köppe Reviewer(s): Kwankyu Lee, Matthias Köppe
Topics here:
Follow-ups:
📝 Checklist
⌛ Dependencies