Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Packaging docs / documentation serving #13

Open
agoose77 opened this issue Jun 25, 2021 · 3 comments
Open

Packaging docs / documentation serving #13

agoose77 opened this issue Jun 25, 2021 · 3 comments
Labels
enhancement New feature or request

Comments

@agoose77
Copy link

agoose77 commented Jun 25, 2021

Hi @Carreau, this is a really cool project!

I was thinking about this today when considering docstrings in Literary.

Background

TL;DR Literary enables you to develop a package with Jupyter Notebooks. Notebooks form the source+docs+tests, whilst there is a build step that produces specialised views (e.g. pure Python source for PyPI, clean rendered docs, etc). During development, notebooks can be imported via an import hook.

I'm planning on treating specially tagged Markdown cells as docstrings, so that there can be rich markup instead of plain-old-text. The best approach for this (I think) is using MyST markdown (which jupyterlab-myst can render in the source notebook), which can then be used with Papyri and Sphinx to generate the intermediate Papyri IR.

Proposed Developer Features

There are a couple of features that I think might be useful for Papyri to support. To be clear, these are not "can you add this" ideas; rather, I am curious as to whether these sorts of features are aligned with your view of the direction of the project.

  • Support finding documentation by data_files.
    It might be useful for package authors to leverage the existing packaging system (PyPI / conda-forge) to distribute documentation. I'm not an expert here, so there might be unforeseen issues with this, but I wonder whether having numpy-docs installed as an extras package for numpy would be a good way of bundling documentation.
  • Add protocol hook _rich_doc_
    I would like to be able to annotate generated modules with metadata that would provide Papyri with the IR for each object. I imagined something like _papyri_ir_ or something analogous to __doc__. I did see your comment that suggests this might already be a feature of sorts - If Literary places the markdown in __doc__, then it would seem that Papyri would implement a system to determine how to parse it. Maybe this is what I want?
  • Add documentation server entrypoint ?
    Whilst the above point enables IR generation for Python objects, if this IR references other parts of the API, I would like to be able to generate that IR for Papyri. For my use case, notebooks will be the main document format, anduring development, I don't want users to have to run an update-docs-like command in order to be able to follow documentation links. It would be nice if there were some way besides data_files to provide documentation i.e. a documentation server. The idea here is that Literary would provide a plugin to generate documentation modules on-the-fly for notebook-backed Python modules. This is a very use-case specific request, so I realise that it might not align with the scope of the tool.

One thing that might be confusing here is that I feel as though Papyri is attempting to do many different things, including:

  1. Better help rendering (e.g. object?, or help(object)
  2. Full documentation navigation.
    I'm not totally clear yet on how these two different things interact; __doc__ is usually a subset of the documentation, but in the existing IPython space there is no navigation, just __doc__ rendering

I hope I've distinguished between the two carefully.

Regardless of my use case, I think this is a really exciting idea for the Jupyter ecosystem. Documentation rendering is the one thing that hasn't really moved forward with the rest of the Jupyter tools, and this project would make a big difference to the every-day experience of developers.

@agoose77 agoose77 changed the title Packaging docs Packaging docs / documentation serving Jun 25, 2021
@Carreau
Copy link
Member

Carreau commented Jun 25, 2021

Thanks a lot for your interest and your ideas, I think I considered most of them at some point, and they are definitely in the long term plan but with caveats

  • Support finding documentation by data_files.
    It might be useful for package authors to leverage the existing packaging system (PyPI / conda-forge) to distribute documentation. I'm not an expert here, so there might be unforeseen issues with this, but I wonder whether having numpy-docs installed as an extras package for numpy would be a good way of bundling documentation.

I think it is reasonable, though PyPI/Conda and other package managers do both a lot more and not enough. For documentation we do not need dependency solving; and/or we could have multiple version of the docs installed to know about future deprecations. Plus the install step need to run the crosslink step; so that sort of goes against the direction package manager are going to which is to just unpack a zip with the right links. So I have the feeling that's shoving a round peg in a square hole, though I'd love to have package manger like condaundestand how to install the docs.

  • Add protocol hook _rich_doc_
    I would like to be able to annotate generated modules with metadata that would provide Papyri with the IR for each object. I imagined something like _papyri_ir_ or something analogous to __doc__. I did see your comment that suggests this might already be a feature of sorts - If Literary places the markdown in __doc__, then it would seem that Papyri would implement a system to determine how to parse it. Maybe this is what I want?

That should be really limited, for security and perf reason you really want to avoid executing code when viewing docs; I guess you could use placeholders in docs that re filled when viewing live docs, and that's I don't want to tackle initially. the first pass will be really narrow in scope.

  • Add documentation server entrypoint ?
    Whilst the above point enables IR generation for Python objects, if this IR references other parts of the API, I would like to be able to generate that IR for Papyri. For my use case, notebooks will be the main document format, anduring development, I don't want users to have to run an update-docs-like command in order to be able to follow documentation links. It would be nice if there were some way besides data_files to provide documentation i.e. a documentation server. The idea here is that Literary would provide a plugin to generate documentation modules on-the-fly for notebook-backed Python modules. This is a very use-case specific request, so I realise that it might not align with the scope of the tool.

Yes, but I think this is just a toolink question. Not easy to implement, but with the right datastructure on disk it should be pretty straightforward to have the equivalent of an editable install.

One thing that might be confusing here is that I feel as though Papyri is attempting to do many different things, including:

  1. Better help rendering (e.g. object?, or help(object)
  2. Full documentation navigation.
    I'm not totally clear yet on how these two different things interact; __doc__ is usually a subset of the documentation, but in the existing IPython space there is no navigation, just __doc__ rendering

I think those two are the same – people just tend to see both differently. Right not the implementation only allow to render __doc__, but there is no reason to stop there, and there is no reason for object? and help(object) to render properly and link to the rest of the docs. Many project use autodoc to include __doc__ in sphinx rendered docs which show there is a desire to do that.

If you squint a bit you realize that __doc__ is just an implementation detail; with papyri, the doc of a function could be in an external file; I mean it's sort of what Literary does in notebooks/tagged cells right ? You just go from __doc__ is rendered on the fly to __doc__ is parsed ahead of time, and we render the IR.

Regardless of my use case, I think this is a really exciting idea for the Jupyter ecosystem. Documentation rendering is the one thing that hasn't really moved forward with the rest of the Jupyter tools, and this project would make a big difference to the every-day experience of developers.

Thanks, that's my view as well, I would appreciate any work on any of the idea you presented; right now I'm mostly focusing on the really core idea, and the minimal useful subset (better rendering of __doc__) with ahead of time generation of IR in order to bootstrap the user base and projects adopting it. Then we can extend with the advance needs.

@agoose77
Copy link
Author

Plus the install step need to run the crosslink step; so that sort of goes against the direction package manager are going to which is to just unpack a zip with the right links.

Is it possible to have this cross-linking done at access time, if Papyri serves as a documentation server? I agree with the sentiment that things should not be done at install time (otherwise we just move back towards setuptools-only installers). I think this probably relates to the idea of an editable install, though.

That should be really limited, for security and perf reason you really want to avoid executing code when viewing docs; I guess you could use placeholders in docs that re filled when viewing live docs, and that's I don't want to tackle initially. the first pass will be really narrow in scope.

Hmm, I suppose this depends upon how Papyri integrates into IPython - how it knows which documentation corresponds to which token / object. Right now I know that we have a bit of everything going on with Jedi integration + IPython.

If you squint a bit you realize that doc is just an implementation detail; with papyri, the doc of a function could be in an external file; I mean it's sort of what Literary does in notebooks/tagged cells right ? You just go from doc is rendered on the fly to doc is parsed ahead of time, and we render the IR.

Yes, I quite agree. The IR is, after all, probably generated from the __doc__ at build time.

Thanks for the quick reply @Carreau, I'll throw some of my thoughts towards this when I first get a chance.

@Carreau
Copy link
Member

Carreau commented Jun 30, 2021

Is it possible to have this cross-linking done at access time, if Papyri serves as a documentation server. I think this probably relates to the idea of an editable install, though.

Yes/No. the forward reference can, not the backward, or you would have to scan all files (assuming installation is dezip
only). You could store nodes/edges in sqlite files and join them all, but that could be a heck lot of file access.

I agree with the sentiment that things should not be done at install time (otherwise we just move back towards setuptools-only installers)

Well, not doing things at install time fall two categories IMHO.

  • run arbitrary setup.py code.
  • Do somethings more than just unzip - but the code is part of the package manager.

The first one is problematic, but the second one is not. For example conda and pip do install entrypoints right ?
IIRC conda does that as well, by rewriting placeholder in compiled packages with the actual path as to where packages
are installed.

The problem with "execute at install" is when the code you execute is part of package you install as you have no clue
what it does and it's hard to undo or take into account.

Admittedly my current choices are to do as much work at install time, as I hate to wait even a tiny bit when I'm
working. And my assumption is that I'm going to access the docs much more often than I'm going to install those,
and thus doing all the work once at install is going to be a better trade-off than doing only a subset of the work many
time. Also when you install you know things have changed so you can can optimally flush caches, while if you look at
request time, the state of the filesystem can have changed.

That is of course not what you want when developing a library for sure; but I think that could be arranged,
I've mostly been impaired by speed of crosslinking by python, but 1) the filestructure is not optimal for parsing and
crosslinking and 2) I have a rust version that parses (but not crosslink) things about 200x faster than Python.

The data store is currently an abstraction, and nothing would prevent us from having a hook that make some of the files
that papyri see as "static" to actually be dynamic for an editable install. I also believe that for developing,
asking users/devs to use sqlite, or even somethings like postgres/neo4j if they want a faster rebuild make sens. Even in
jupyterhub context to track who consult what it would be good.

But for the time being I don't want to deal with the concurrency issues with sqlite and files are way more debug-able
than any servers.

I've also seen that sqlite now have ability to store and hook into json, that might be the perfect case to store a graph
Db, but I did not manage to get it to work.

At least all of the above are my thoughts, i can be convince i'm wrong and proven otherwise.

My hope is that if docbundles are standardized, then multiple tooling with different trade off will emerge (or be
options of papyri), and I'd like to see how far we can get with the current restrictions.

Though, if you want to try some changes, please go ahead, I would be happy to merge explorations and options to see
where this is going.

Hmm, I suppose this depends upon how Papyri integrates into IPython - how it knows which documentation corresponds to
which token / object. Right now I know that we have a bit of everything going on with Jedi integration + IPython.

For the ? operator I'd like to settle on fully qualified name, and if the object does not have a __module__ /
__class__, __name__ that's up to the target library to fix it. It's not perfect but I believe good enough for now.
The only thing I my change is : instead of . for submodule / package separator. Some things are ambiguous, for
example matplotlib.tri.tripcolor is ambiguous as the tripcolor function shadow it's submodule. With : it would not.
Indeed: matplotlib.tri.tripcolor is the module, as the function would be matplotlib.tri:tripcolor vs
matplotlib.tri.tripcolor:tripcolor.

For non-docstrings, I don't think it matters, as you would already be "in papyri" while you navigate. So as long as each
resource has a unique id, we should be fine. I'm also hopping that when using notebook, papyri can be outside of the
IPython context and IPython would just tell the frontend, "here is the doc and the object id" and the frontend would
access papyri by itself. That would allow to share docs across many installations, and not put more load to the kernel.

Ok, already too long so stopping here, and please feel free to experiment and send PRs to test ideas.

@melissawm melissawm added the enhancement New feature or request label Oct 26, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

3 participants