♻️ REFACTOR: package API/CLI/documentation #74

chrisjsewell · 2021-08-02T00:41:40Z

This PR re-writes key parts of the package (a) to add additional functionality, and (b) with a view to eventually exposing this CLI in https://jupyterbook.org/.
Key changes:

stage/staging is now rephrased to notebook, plus the addition of project, i.e. you add notebooks to a project, then execute them
notebook read_data is specified per notebook in the project, allowing for multiple types of file to be read/executed via the CLI (e.g. MyST Markdown files via jupytext). Before, the read functions were passed directly to the API methods.
The executor can be specified with jbcache execute --executor, and a parallel notebook executor has been added.
Improved execution status indicator in jbcache project list and othe CLI improvements
Re-write of documentation, including better front page, with quick start guide and better logo.

Rather than passing an optional `converter` to methods, we now store staged files with a specific reader key. The key relates to an entry-point (in group `jcache.readers`) of dynamically loaded reader. Also, the `jupyter_executors` entry group has been changed to `jcache.executors`, and `importlib-metadata` is used to load entry points.

codecov · 2021-08-02T00:43:34Z

Codecov Report

Merging #74 (1d3fd4b) into master (7917c68) will increase coverage by 1.51%.
The diff coverage is 77.63%.

@@            Coverage Diff             @@
##           master      #74      +/-   ##
==========================================
+ Coverage   81.33%   82.85%   +1.51%     
==========================================
  Files          17       20       +3     
  Lines        1045     1318     +273     
==========================================
+ Hits          850     1092     +242     
- Misses        195      226      +31

Flag	Coverage Δ
pytests	`82.85% <77.63%> (+1.51%)`	⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files	Coverage Δ
jupyter_cache/cli/commands/cmd_main.py	`100.00% <ø> (ø)`
jupyter_cache/cli/utils.py	`56.25% <56.25%> (ø)`
jupyter_cache/cli/commands/cmd_project.py	`64.91% <64.91%> (ø)`
jupyter_cache/cli/commands/cmd_cache.py	`72.66% <65.78%> (-2.73%)`	⬇️
jupyter_cache/utils.py	`87.93% <66.66%> (-6.07%)`	⬇️
jupyter_cache/cli/__init__.py	`72.00% <72.00%> (ø)`
jupyter_cache/entry_points.py	`75.00% <75.00%> (ø)`
jupyter_cache/cli/commands/cmd_notebook.py	`76.19% <76.19%> (ø)`
jupyter_cache/cache/main.py	`88.10% <76.78%> (+1.43%)`	⬆️
jupyter_cache/cache/db.py	`85.33% <78.51%> (-1.59%)`	⬇️
... and 9 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 7917c68...1d3fd4b. Read the comment docs.

It was felt that this is conceptually easier to understand, i.e. it is a list of records for each notebook (& associated data) in the project, rather than just a staging area for pre-executed notebooks.

For `jcache project` commands, and remove `--all` option, in favour of separate `jcache project clear`

To write out a notebook merging the project file with its cached outputs.

The execution logic was also refactored, to reduce code duplication. Note, artefact retrieval has been removed for now, until the logic can be improved.

chrisjsewell · 2021-08-04T06:33:13Z

@choldgraf there is some more I probably want to do here, but it would be good if you could have a skim of the new documentation and give some feedback ta

choldgraf · 2021-08-04T06:35:59Z

Cool, I'll assign myself so I remember

chrisjsewell · 2021-08-04T06:46:58Z

Some notes on possible TODOs

jbcache project:

allow for different kernel to one specified in notebook (see Expose kernel_name of nbclient.nbexecute() #63)
retain record of last executed cache record and allow to diff against it
~~remove related cache record (e.g. jbcache project invalidate <pk/uri>)~~
allow to infer reader from extension when adding files
disable specific files (so they are not re-executed, e.g. jbcache project enable/disable <pk/uri>)
allow to add assets to existing file without having to remove then re-add it
include assets in hash?
storing full failed notebooks somewhere to look at, on top of just the traceback

docs:

Add API/click autodoc

myst-nb / jupyter-book integration

how to make default path to cache the same one that they use
how to automatically load custom notebook readers (or at least allow to skip files with unknown readers when calling jbcache execute rather than excepting)

other:

add mypy type checking
~~more jupytext integration~~

chrisjsewell · 2021-08-04T22:33:17Z

Related to this, I also just opened jupyter/nbclient#151

mmcky · 2021-08-05T10:05:50Z

@chrisjsewell these improvements look great. I particularly like the new terminology with project vs staged etc.

One question I had in relation to the executor is do you think it might be a good time to add optional dependencies for execution. In jupinx we built in the ability to say that a lecture or page needed to be executed only after another one has already been executed. This enabled us to reuse files or outputs from one lecture in another.

I don't think is a high priority issue but it is a nice to have feature.

akhmerov

Related to this, I also just opened jupyter/nbclient#151

At a glance, that approach seems to fail with the way hash keys are implemented now. Imagine the user modifies a cell with skip-execution tag applied. This should definitely result in the same key. Now, however, all code cells are used to compute the hash key.

Also parallel execution is amazing! I'd love to cut down on those 1 hour build times.

choldgraf

This is a nice re-work! I focused on the documentation in this review - in general I think it's good and does a nice job of explaining the end-to-end functionality of the Python API and the CLI. My main questions and comments were around nomenclature and making sure that some of the ideas are explained in a clear way. I tried to note where I was a bit confused, as presumably this is where others will be confused as well! Happy to take another look if you make some changes!

choldgraf · 2021-08-05T23:23:24Z

docs/conf.py

+                shutil.rmtree(path)
+            return []
+
+    class JcacheCli(SphinxDirective):


Can you provide comments for what these classes do so that others have extra context? I guess it's a developer-friendly tool for these docs?

added docstring

docs/develop/contributing.md

docs/index.md

choldgraf · 2021-08-06T14:37:56Z

docs/index.md

+
+```{jcache-cli} jupyter_cache.cli.commands.cmd_main:jcache
+:command: execute
+:args: --executor local-serial


Since this is the first time people have seen the execute command, I'd recommend leaving out any extra arguments like --executor until you can explain what they mean in a subsequent step.

choldgraf · 2021-08-06T14:40:13Z

docs/index.md

+
+```{jcache-cli} jupyter_cache.cli.commands.cmd_project:cmnd_project
+:command: merge
+:args: 1 _executed_notebook.ipynb


It's unclear where this _executed_notebook.ipynb file came from. Did you create it somewhere?

and improved wording

choldgraf · 2021-08-06T14:56:38Z

docs/using/cli.md

+You can diff any of the cached notebooks with any (external) notebook:
+
+```{jcache-cli} jupyter_cache.cli.commands.cmd_cache:cmnd_cache
+:command: diff-nb


given that all of the things in the cache are notebooks, why not just call it diff instead of diff-nb?

choldgraf · 2021-08-06T14:58:56Z

docs/using/api.ipynb

   "source": [
    "(use/api)=\n",
    "\n",
    "# Python API"
-   ]


meta question: could we make api.ipynb a MyST-NB notebook? That way it would be much easier to review and diff

choldgraf · 2021-08-06T15:00:11Z

docs/using/api.ipynb

  },
  {
   "cell_type": "markdown",
-   "metadata": {},
   "source": [
    "(use/api/cache)=\n",
    "\n",
    "## Cacheing Notebooks"


I think Cacheing Notebooks should come after the staging/execution examples, since that would mirror the same structure that the Command-Line page uses. And since staging/execution is more common than Cacheing, presumably?

choldgraf · 2021-08-06T15:02:21Z

docs/using/api.ipynb

  },
  {
   "cell_type": "markdown",
-   "metadata": {},
   "source": [
    "Notebooks can be staged, by adding the path as a stage record.\n",


Should this be renamed something like ## Add notebooks to a project for execution?

And in general this section uses "staging", "the staged notebook" etc, rather than "project" terminology, which is a bit confusing as I'm not sure how "staging" and "project" relate to one another

choldgraf · 2021-08-06T15:04:43Z

docs/using/api.ipynb

  },
  {
   "cell_type": "code",
   "execution_count": 26,
-   "metadata": {},
+   "source": [
+    "cache.merge_match_into_file(\n",


does this update the cache itself, or simply return a notebook that is merged? We should make this clear via a note or something

click v8 has improved completion handling

So that we can make it more flexible, and allow for the myst-nb custom formats (with kwargs)

…to caches, move `--cache-path` to sub-levels

chrisjsewell · 2022-01-13T07:09:16Z

Just wondering what your thoughts are on lecture dependency?

It could be possible, but certainly not in this PR

chrisjsewell · 2022-01-13T07:25:32Z

Ok @choldgraf and @mmcky, all changes applied from our discussion, so good to go:

$ jcache
Usage: jcache [OPTIONS] COMMAND [ARGS]...

  The command line interface of jupyter-cache.

Options:
  -v, --version       Show the version and exit.
  -p, --print-path    Print the current cache path and exit.
  -a, --autocomplete  Print the autocompletion command and exit.
  -h, --help          Show this message and exit.

Commands:
  cache     Work with cached execution(s) in a project.
  notebook  Work with notebook(s) in a project.
  project   Work with a project.

$ jcache project
Usage: jcache project [OPTIONS] COMMAND [ARGS]...

  Work with a project.

Options:
  -p, --cache-path TEXT  Path to project cache.  [default: (.jupyter_cache)]
  -h, --help             Show this message and exit.

Commands:
  cache-limit  Get/set maximum number of notebooks stored in the cache.
  clear        Clear the project cache completely.
  execute      Execute all outdated notebooks in the project.
  version      Print the version of the cache.

$ jcache notebook
Usage: jcache notebook [OPTIONS] COMMAND [ARGS]...

  Work with notebook(s) in a project.

Options:
  -p, --cache-path TEXT  Path to project cache.  [default: (.jupyter_cache)]
  -h, --help             Show this message and exit.

Commands:
  add              Add notebook(s) to the project.
  add-with-assets  Add notebook(s) to the project, with possible asset...
  clear            Remove all notebooks from the project.
  execute          Execute specific notebooks in the project.
  info             Show details of a notebook (by ID).
  invalidate       Remove any matching cache of the notebook(s) (by ID/URI).
  list             List notebooks in the project.
  merge            Create notebook merged with cached outputs (by ID/URI).
  remove           Remove notebook(s) from the project (by ID/URI).

$ jcache cache
Usage: jcache cache [OPTIONS] COMMAND [ARGS]...

  Work with cached execution(s) in a project.

Options:
  -p, --cache-path TEXT  Path to project cache.  [default: (.jupyter_cache)]
  -h, --help             Show this message and exit.

Commands:
  add                 Cache notebook(s) that have already been executed.
  add-with-artefacts  Cache a notebook, with possible artefact files.
  cat-artefact        Print the contents of a cached artefact.
  clear               Remove all executed notebooks from the cache.
  diff                Print a diff of a notebook to one stored in the cache.
  info                Show details of a cached notebook.
  list                List cached notebook records.
  remove              Remove notebooks stored in the cache.

chrisjsewell · 2022-01-25T11:06:58Z

Ok, I will take the silence as implicit consent 😅 and merge.
This will be in a 0.5 release, so obviously won't impact myst-nb/jupyter-book straight away

mmcky · 2022-01-26T05:01:12Z

thanks @chrisjsewell -- I look forward to the new cli. 👍

choldgraf · 2022-01-26T20:28:47Z

Cc @jjalaire - who is using this inside of quarto (I believe?). This is changing up the API a little bit so you might want to pin versions and double check your usage to make sure it still works!

chrisjsewell added 3 commits July 31, 2021 03:47

♻️ REFACTOR: Split basic executor -> local-serial & temp-serial

1376353

🔧 MAINTAIN: remove unnecessary pass

5197318

chrisjsewell added 12 commits August 2, 2021 04:56

♻️ REFACTOR: stage -> project

3be9692

It was felt that this is conceptually easier to understand, i.e. it is a list of records for each notebook (& associated data) in the project, rather than just a staging area for pre-executed notebooks.

♻️ REFACTOR: Consolidate remove-ids/remove-uris -> remove

dfba7b0

For `jcache project` commands, and remove `--all` option, in favour of separate `jcache project clear`

✨ NEW: Add jbcache project merge

fe4eaa6

To write out a notebook merging the project file with its cached outputs.

📚 DOCS: Update execution

86a57e1

📚 DOCS: Update logo

3519065

📚 DOCS: Update introduction page

c22fceb

🧪 TESTS: Add CLI execute test

6cae353

♻️ REFACTOR: Move critical CLI dependencies to install_requires

a35c364

👌 IMPROVE: CLI: Add status to project list

6d780fd

✨ NEW: Parallel (multiprocess) notebook execution

47ab941

The execution logic was also refactored, to reduce code duplication. Note, artefact retrieval has been removed for now, until the logic can be improved.

🔧 MAINTAIN: Re-add python 3.6 CI

0a5d1fe

📚 DOCS: Re-write CLI tutorial

6ee0f79

chrisjsewell changed the title ~~🔀 MERGE: Improve notebook execution~~ 🔀 MERGE: Refactor package Aug 4, 2021

chrisjsewell requested a review from choldgraf August 4, 2021 06:31

choldgraf self-assigned this Aug 4, 2021

📚 DOCS: Improve favicon

bd7493b

chrisjsewell requested a review from mmcky August 4, 2021 22:16

akhmerov reviewed Aug 6, 2021

View reviewed changes

choldgraf reviewed Aug 6, 2021

View reviewed changes

chrisjsewell added 15 commits January 12, 2022 13:48

Remove click-completion

8e291bf

click v8 has improved completion handling

Update index.md

819f40c

Update index.md

8e8df2e

Merge branch 'master' into improve-exec

f3c3f9a

update deps

b10fbfe

expose kwargs of execution

5520614

Update tests.yml

b58c14f

Update tests.yml

a2772c7

update pytest

7d7e25d

diff-nb to diff

ca14dcf

Make the stored read_data a dict

2c16de5

So that we can make it more flexible, and allow for the myst-nb custom formats (with kwargs)

add version to setting table, add invalidate command

626a387

jcache project -> jcache notebook

add3a92

Move config,clear and execute to jcache project, add version …

06741f0

…to caches, move `--cache-path` to sub-levels

add jcache project execute --force

ff25419

chrisjsewell added 2 commits January 13, 2022 08:15

Add exec_data field to nbproject

f9096e0

Change isort config

afd4444

chrisjsewell marked this pull request as ready for review January 13, 2022 07:21

chrisjsewell added 4 commits January 13, 2022 08:36

Upgrade to python 3.7

4c2b5b8

reduce version, so that docs build

c473dc2

add emojis to docs

979c051

Add "Why use jupyter-cache?" docs section

1d3fd4b

chrisjsewell changed the title ~~🔀 MERGE: Refactor package~~ ♻️ REFACTOR: package API/CLI/documentation Jan 25, 2022

chrisjsewell merged commit 065dcaf into master Jan 25, 2022

chrisjsewell deleted the improve-exec branch January 25, 2022 11:12

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

♻️ REFACTOR: package API/CLI/documentation #74

♻️ REFACTOR: package API/CLI/documentation #74

chrisjsewell commented Aug 2, 2021 •

edited

Loading

codecov bot commented Aug 2, 2021 •

edited

Loading

chrisjsewell commented Aug 4, 2021

choldgraf commented Aug 4, 2021

chrisjsewell commented Aug 4, 2021 •

edited

Loading

chrisjsewell commented Aug 4, 2021

mmcky commented Aug 5, 2021

akhmerov left a comment •

edited

Loading

choldgraf left a comment

choldgraf Aug 5, 2021

chrisjsewell Jan 13, 2022

choldgraf Aug 6, 2021

choldgraf Aug 6, 2021

chrisjsewell Jan 13, 2022

chrisjsewell Jan 13, 2022

choldgraf Aug 6, 2021

chrisjsewell Jan 12, 2022

choldgraf Aug 6, 2021

choldgraf Aug 6, 2021

choldgraf Aug 6, 2021

choldgraf Aug 6, 2021

chrisjsewell commented Jan 13, 2022

chrisjsewell commented Jan 13, 2022

chrisjsewell commented Jan 25, 2022

mmcky commented Jan 26, 2022

choldgraf commented Jan 26, 2022

♻️ REFACTOR: package API/CLI/documentation #74

♻️ REFACTOR: package API/CLI/documentation #74

Conversation

chrisjsewell commented Aug 2, 2021 • edited Loading

codecov bot commented Aug 2, 2021 • edited Loading

Codecov Report

chrisjsewell commented Aug 4, 2021

choldgraf commented Aug 4, 2021

chrisjsewell commented Aug 4, 2021 • edited Loading

chrisjsewell commented Aug 4, 2021

mmcky commented Aug 5, 2021

akhmerov left a comment • edited Loading

Choose a reason for hiding this comment

choldgraf left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

chrisjsewell commented Jan 13, 2022

chrisjsewell commented Jan 13, 2022

chrisjsewell commented Jan 25, 2022

mmcky commented Jan 26, 2022

choldgraf commented Jan 26, 2022

chrisjsewell commented Aug 2, 2021 •

edited

Loading

codecov bot commented Aug 2, 2021 •

edited

Loading

chrisjsewell commented Aug 4, 2021 •

edited

Loading

akhmerov left a comment •

edited

Loading