Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Paper: Papyri: Better documentation for the Scientific Ecosystem in Jupyter #700

Merged
merged 53 commits into from
Jul 6, 2022

Conversation

Carreau
Copy link

@Carreau Carreau commented May 27, 2022

See http://procbuild.scipy.org/ for logs generated by the build process.

Thanks for all your work on organising SciPy.

Editor: @stargaser

Reviewers: @wd15, @karthikmurugadoss

@karthikmurugadoss
Copy link

The paper presented here discusses Papyri which is an approach for unifying the documentation experience in the scientific Python ecosystem. Primarily, Papyri aims at decoupling the process of creating documentation (using the presented IRD format) and actually rendering the documentation (which can often times be user specific)

The paper describes the motivation and current state of Papyri quite well. The following are my major and minor comments.

Major Comments:

  1. Section on current implementation conveys a lot of information and is a bit difficult to follow for someone who isn't well-versed with existing documentation workflows. A few concepts that are introduced are RST parsing, CBOR representation, etc.
  2. The section on IRD file installation can benefit from a visual showing the different components (SQLite database, Raw storage on disk, etc.) and how these components interact with each other.
  3. Related to the above point, it is not clear why the object information is stored in 3 different places and what exactly is stored in each of these locations. More clarity here would be very helpful.
  4. The context and usefulness of the local graph visualization is not described. Does Papyri create this visual as a part of the documentation generation process? Or is it here specifically to highly the connections into/from ndarray?

Minor Comments:

  1. There are typos and missing words in a number of places which need to be addressed.
  2. In the Current Implementation section, there are a number of referenced tools for which links to their sources would be helpful. e.g. Jedi, Pygments, Quart, Trio, etc.
  3. The numbering for sections can be improved

@deniederhut
Copy link
Member

@scoobies mark pending comment

from wd15:

> The intermediate format or IRD is a very important step for the
> community. Other tools can build from this format either by
> generating the documentation view or by generating the IRD from the
> source code. It would be nice in the paper if the authors could
> actually describe the details of the format or schema for the
> IRD. The schema itself could become a standard for documentation
> and, thus, making it transparent to the reader would be useful.

We've extended the paragraph speaking of this. As the IRD isstill
changing rapidly we don't belove a description that woudl be outdated in
a week would be useful in the paper.
from wd15:

> The intermediate format or IRD is a very important step for the
> community. Other tools can build from this format either by
> generating the documentation view or by generating the IRD from the
> source code. It would be nice in the paper if the authors could
> actually describe the details of the format or schema for the
> IRD. The schema itself could become a standard for documentation
> and, thus, making it transparent to the reader would be useful.

We want to avoid duplication, and would prefer to point to other
medatada source. Currently we limit to only what is necessary
@Carreau
Copy link
Author

Carreau commented Jun 15, 2022

Many thanks for both reviews, I've tried to address most of the above points in different commits to try to make re-review easier.

Here is a small summary of the changes.

The intermediate format or IRD is a very important step for the
community. Other tools can build from this format either by
generating the documentation view or by generating the IRD from the
source code. It would be nice in the paper if the authors could
actually describe the details of the format or schema for the
IRD. The schema itself could become a standard for documentation
and, thus, making it transparent to the reader would be useful.

I believe the project is too young to give a complete description of the IRD,
there are still regular changes to the format every 2 to 3 weeks depending on
the activity, thus a detail description would be premature. The IRD is still
changing much less frequently than initially but still too frequently IMHO.
I tied to clarify this.

Suggestion: Pandoc is a tool that uses an AST and can covert
between many markup and documentation formats. Would it be useful
to mention Pandoc in the paper as an example of a successful tool
that uses a similar approach?

Yes, this is a good tool, I've added it. I was also recently made aware of
https://markdoc.io/docs/nodes, that was released recently but haven't had a
change to try it, so didn't wanted to make major changes that late in the
proceeding process.

Suggestion: As part of the IRD, package metadata is stored. Would
it be useful to use an existing schema such as CODEMETA.yaml for
this.

I've extended a bit this section, my main take so far is to limit to the minimum
vital, and rely on metadata that is stored somewhere else. I would much prefer
something like codemeta or other JSON LD format to be part of the package on
PyPI.

Section numbering could be improved by using "X.Y" for
subsections. Also, shouldn't the Introduction be numbered as Section 1?

We removed section numbering altogether, it should be something that is handled
at the proceeding level. Is the proceeding is compiled with a directive like
sectnum then sections will have number, if we add them manually then the
number will appear twice.

Section on current implementation conveys a lot of information and is a bit
difficult to follow for someone who isn't well-versed with existing
documentation workflows. A few concepts that are introduced are RST parsing,
CBOR representation, etc.

I've extend where possible, but as for other comments above about IRD schema, I
don't want to dive too far into the implementation as it is still changing
quickly and should not be necessary to understand the idea.

I hope that in the future other competing projects will for that
produce/consume IRD bundles, and potentially make completely different technical
choices.

The section on IRD file installation can benefit from a visual showing the
different components (SQLite database, Raw storage on disk, etc.) and how
these components interact with each other.

I've tried to clarify as well, and made a schema. I've also extended that these
choices are mostly made due to the current use case I target and could/should be
reconsider by an implementation with different targets.

Related to the above point, it is not clear why the object information is
stored in 3 different places and what exactly is stored in each of these
locations. More clarity here would be very helpful.

As for above, I've tried to clarify, let me know if this is clearer.

The context and usefulness of the local graph visualization is not described.
Does Papyri create this visual as a part of the documentation generation
process? Or is it here specifically to highly the connections into/from
ndarray?

Thanks, this was indeed not clear, I've reworked this section. This was trying
to demonstrate that changes to the documentation could be done without having
to re-do the generation step. This graph is indeed generated at render time, and
updates depending on which libraries you have documentation installed for, and
give you ideas of types of UI changes that could be implemented later.

There are typos and missing words in a number of places which need to be addressed.

We tried to fix things the best we could, and would appreciate any pointers to
remaining mistakes.

In the Current Implementation section, there are a number of referenced tools for which links to their sources would be helpful. e.g. Jedi, Pygments, Quart, Trio, etc.

This should be mostly fixed, beyond a couple of citation I need to expand with
the right DOIs.

@deniederhut
Copy link
Member

Awesome! @wd15 and @karthikmurugadoss -- do you feel this paper is now ready for inclusion in the proceedings?

@karthikmurugadoss
Copy link

Yes did another read through and it looks good to me!

@Carreau
Copy link
Author

Carreau commented Jun 27, 2022

Yes did another read through and it looks good to me!

Many thanks, that's giving me extra motivation after the week-end !

@wd15
Copy link

wd15 commented Jun 27, 2022

Looks much better. Nice work!

README.md Outdated Show resolved Hide resolved
README.md Outdated Show resolved Hide resolved
@stargaser
Copy link
Contributor

@scoobies mark ready

@deniederhut deniederhut merged commit 219b06f into scipy-conference:2022 Jul 6, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
paper This indicates that the PR in question is a paper ready-for-review
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants