data/config path entry_points with minimal examples #209

bollwyvl · 2020-11-21T21:01:31Z

Background

Jupyter relies on a hierarchy of directories (user-level, environment-level, system-level, etc.) to store configuration and data. These directories are used by a number of Jupyter programs, for example:

Most applications based on the traitlets Configurable application class store configuration in JSON files in the configuration directories. They also aggregate conf.d-style configuration from these directories to determine settings of options.
Jupyter Notebook extensions copy their javascript assets into a data directory on installation for the server to serve
JupyterLab extensions copy their javascript assets into a data directory on installation for the server to serve.

Problem

Currently the environment level of this directory hierarchy is a fixed location based on sys.prefix. This means that packages need to copy their files into this directory at install time, which has several issues:

Copying files into a data directory uses the data_files feature of Python packages, which is deprecated in setuptools and is not supported in non-setuptools-based packagers like flit, poetry (see here), etc.
Data files are duplicated in the package bundle (once for copying into the data directory, once for being included in the actual package to install into site-packages). For some extensions, this a huge (like megabytes or tens of megabytes).
Development installs (pip -e) do not update data files when the source files change, so when developing a package, if something changes to the data files, you either have to copy them over again, or you have to run a command to make the appropriate data directory a symbolic link (not available on some platforms) to the source files.

(Also, it seems that sometimes these data file directories are not deleted. For example, in JupyterLab we actually create files at runtime in the data directory, and I think they don't get deleted when JupyterLab is uninstalled)

Proposed solution

Python has another mechanism that is explicitly designed for plugin systems called entry points. An entry point is a piece of metadata in a package that points to an arbitrary import from the package. This PR changes jupyter_core to look for two specific entry points in any installed package, each pointing to a list of paths, to augment the environment-level Jupyter config directories (the jupyter_config_paths entry point) and data directories (the jupyter_data_paths entry point). The result is:

Any package can add new environment-level Jupyter config and data directories. In practice, this means that a package can contain data or configuration in a directory that is installed in its site-packages directory, and can use the entry point to point Jupyter to that internal directory. Since this directory is internal to the package:
- the files are not duplicated in the package tarball
- development (un)installs automatically work, since the directory points to an internal directory in the package
- other python package managers can be used, like poetry using its include/exclude mechanism for files
non-Python programs can access this (and all other paths) by shelling out to jupyter --paths --json

Problems with the proposed solution

Entry points are based on importing a module to get a value, which potentially could be very expensive. We explore parsing the file first for literal values, and then importing as a last resort, which seems to alleviate this problem in the common case (setuptools does something similar for its attr handler for setup.cfg values).
neither entry_point group is cached
- an interactive installation with e.g. pip install or conda install would be able to update the search path, provided the application isn't doing its own caching...
  - this is important to maintain the observed behavior of data_files
  - because the import system is invoked, users of this system may wish to create a separate python_packages entry for these static assets, to avoid bringing in otherwise-unused runtime dependencies, e.g. pandas
- adding some debug logging around this will help pinpoint slow startup times
  - turns out there is no logging this deep in the stack. we could either:
    - add a log=None argument to the various calls
    - add a logger controlled by a JUPYTER_CORE_LOGLEVEL
if an entry_point is added or (its target is changed) in a package with an editable install, it must be reinstalled
- however, if only the return value of an existing entry_point is changed, no re-install is required
existing tools that were relying on indexing jupyter_*paths()
- this occurs in the test suite for jupyter_core itself: if one of the example packages is installed, the tests break
- these will have to be updated to inspect relative positions, e.g. was the user dir loaded before or after the env paths when JUPYTER_PREFER_ENV is set

Alternative solutions

setuptools also provides a way for a package to have custom metadata files in the egg or dist_info directories. This avoids the problems of importing or parsing an arbitrary python file to get the few strings that we need. However, it appears that this arbitrary metadata is not well supported outside of setuptools. See below for some experiments around this approach.

Example

See the setuptools example, specifically

jupyter_core/examples/jupyter_path_entrypoint_setuptools/setup.cfg

Lines 35 to 39 in 38e3acd

    
           [options.entry_points] 
        
           jupyter_config_paths = 
        
               entry-point-example-setuptools = entry_point_example_setuptools:JUPYTER_CONFIG_PATHS 
        
           jupyter_data_paths = 
        
               entry-point-example-setuptools = entry_point_example_setuptools:JUPYTER_DATA_PATHS

this approach requires a boilerplate MANIFEST.in and a setup.py in order to be installed from source

and the flit example, specifically

jupyter_core/examples/jupyter_path_entrypoint_flit/pyproject.toml

Lines 11 to 15 in 38e3acd

    
           [tool.flit.entrypoints.jupyter_config_paths] 
        
           entry-point-example-flit = "entry_point_example_flit:JUPYTER_CONFIG_PATHS" 
        
           [tool.flit.entrypoints.jupyter_data_paths] 
        
           entry-point-example-flit = "entry_point_example_flit:JUPYTER_DATA_PATHS"

for examples of how to use these entry points.

pyproject.toml is the only boilerplate file needed, and generates a setup.py
flit can also generate binary reproducible whl files (for python >=3.7) given the same version of flit_core

Original issue description

Hey folks! Thanks for keeping this foundational technology working.

data_files are making me sad enough that I'm willing to bring this up again.

This is a low-downstream-impact way we could allow python packages to not require the ill-supported data_files technique.

To test:

pip install -e .
cd examples/entry_point_example
pip install -e .
jupyter --paths
# should see that development environment in place
pip uninstall entry_point_example
jupyter --paths
# it's gone

I don't know if it really works yet, down the the n-th downstream, but seems it should if they are relying on jupyter_*_dir, and handling multiple paths already.

maartenbreddels · 2020-11-21T21:38:59Z

I see this as a good alternative to using data_files without overhauling the config system. I am a bit worried that it's hard to debug when things go wrong (if 15 directories will be scanned). Could we maybe provide a richer debug facility to see a particular config key, and how each directory is changing it. Grepping in 15 directories will not be fun. Or do I see a problem that does not exist, and are the debug options sufficient?

bollwyvl · 2020-11-21T22:02:26Z

Grepping in 15 directories will not be fun

Yep, there will be a lot of directories beyond the Big Four. No doubt some combination of jupyter --paths, jq, and xargs would make grep plausible, but that's no fun!

A JupyterApp base flag like --show-config which every app would inherit is a whacking good idea, even outside of this little draft. It could probably use difflib to generate a decently-readable representation of the config before each file was loaded, and show the final config, perhaps something like:

$> jupyter foo --show-config

environment variables:
- JUPYTER_PREFER_ENV_PATH: not set
- ...

paths:
- /etc/jupyter/jupyter_config.json: not found
...
- ~/my-project/src/my_project/etc/jupyter_foo_config.d/my-project.json:

    + SomeHasTraits:
    +   foo: bar
...
- ~/my-project/src/my_project/.venv/etc/jupyter_config.d/someone-elses-project.json:

      SomeHasTraits:
    -   foo: bar
    +   foo: baz

...
- ./jupyter_foo_config.json: not found

final:

    SomeHasTraits:
      foo: baz

sprinkle in some pygments (if available) and it would be pretty usable.

maartenbreddels · 2020-11-22T06:11:01Z

Indeed, exactly what I had in mind, that would help a lot

bollwyvl · 2020-11-22T14:46:07Z

Gah, looking at it: a lot of the complexity is duplicated between jupyter_server and notebook... while both would work with this PR, there's no simple way to add the above config inspection.

Perhaps the better short-term approach would be to invert it, with a separate package/command, e.g. offered jupyter show-config notebook FooHasTraits.bar. I guess this would work by overloading/monkeypatching config_manager_class (gaaah) with an instrumented subclass, and call initialize but not start.

Because of that complexity, this could probably not land here, unless the ConfigManager pattern was brought upstream, which sounds hard to coordinate.

bollwyvl · 2020-11-22T22:52:50Z

I have an unshaeably bad version of this, but it kinda works with notebook, jupyter_server, jupyterlab and voila installed:


getting jupyter_server_config from /etc/jupyter
got {}
getting jupyter_server_config from /usr/local/etc/jupyter
got {}
getting jupyter_server_config from /home/weg/projects/jupyter_showconfig_/envs/default/etc/jupyter
Reading file /home/weg/projects/jupyter_showconfig_/envs/default/etc/jupyter/jupyter_server_config.d/jupyterlab.json
Reading file /home/weg/projects/jupyter_showconfig_/envs/default/etc/jupyter/jupyter_server_config.d/nbclassic.json
Reading file /home/weg/projects/jupyter_showconfig_/envs/default/etc/jupyter/jupyter_server_config.d/voila.json
got {'ServerApp': {'jpserver_extensions': {'jupyterlab': True, 'nbclassic': True, 'voila.server_extension': True}}}
getting jupyter_server_config from /home/weg/.jupyter
got {}
getting page_config from /etc/jupyter/labconfig
got {}
getting page_config from /usr/local/etc/jupyter/labconfig
got {}
getting page_config from /home/weg/projects/jupyter_showconfig_/envs/default/etc/jupyter/labconfig
got {}
getting page_config from /home/weg/.jupyter/labconfig
got {}
[I 2020-11-22 17:50:37.177 ServerApp] jupyterlab | extension was successfully linked.
getting jupyter_notebook_config from /home/weg/.jupyter
got {}
getting jupyter_notebook_config from /etc/jupyter
got {}
getting jupyter_notebook_config from /usr/local/etc/jupyter
got {}
getting jupyter_notebook_config from /home/weg/projects/jupyter_showconfig_/envs/default/etc/jupyter
Reading file /home/weg/projects/jupyter_showconfig_/envs/default/etc/jupyter/jupyter_notebook_config.d/jupyterlab.json
Reading file /home/weg/projects/jupyter_showconfig_/envs/default/etc/jupyter/jupyter_notebook_config.d/voila.json
got {'NotebookApp': {'nbserver_extensions': {'jupyterlab': True, 'voila.server_extension': True}}}
getting jupyter_notebook_config from /home/weg/.jupyter
got {}
[I 2020-11-22 17:50:37.322 ServerApp] nbclassic | extension was successfully linked.
[I 2020-11-22 17:50:37.322 ServerApp] voila.server_extension | extension was successfully linked.
[I 2020-11-22 17:50:37.339 LabApp] JupyterLab extension loaded from /home/weg/projects/jupyter_showconfig_/envs/default/lib/python3.7/site-packages/jupyterlab
[I 2020-11-22 17:50:37.339 LabApp] JupyterLab application directory is /home/weg/projects/jupyter_showconfig_/envs/default/share/jupyter/lab
[I 2020-11-22 17:50:37.342 ServerApp] jupyterlab | extension was successfully loaded.
[I 2020-11-22 17:50:37.345 ServerApp] nbclassic | extension was successfully loaded.
[I 2020-11-22 17:50:37.347 ServerApp] voila.server_extension | extension was successfully loaded.

Update: here's some better stuff, generated with rich:


 op     ┃ section_name                                      ┃ path                              ┃ old_value ┃ new_value                                                  
━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
 stage  │                                                   │                                   │           │ before-init                                                
────────┼───────────────────────────────────────────────────┼───────────────────────────────────┼───────────┼────────────────────────────────────────────────────────────
 patch  │                                                   │ io.open                           │           │                                                            
────────┼───────────────────────────────────────────────────┼───────────────────────────────────┼───────────┼────────────────────────────────────────────────────────────
 patch  │                                                   │ BaseJSONConfigManager.get         │           │                                                            
────────┼───────────────────────────────────────────────────┼───────────────────────────────────┼───────────┼────────────────────────────────────────────────────────────
 stage  │                                                   │                                   │           │ before-constructor                                         
────────┼───────────────────────────────────────────────────┼───────────────────────────────────┼───────────┼────────────────────────────────────────────────────────────
 stage  │                                                   │                                   │           │ after-constructor                                          
────────┼───────────────────────────────────────────────────┼───────────────────────────────────┼───────────┼────────────────────────────────────────────────────────────
 get    │ jupyter_server_config                             │ /etc/jupyter                      │           │                                                            
────────┼───────────────────────────────────────────────────┼───────────────────────────────────┼───────────┼────────────────────────────────────────────────────────────
 got    │ jupyter_server_config                             │ /etc/jupyter                      │           │ {}                                                         
────────┼───────────────────────────────────────────────────┼───────────────────────────────────┼───────────┼────────────────────────────────────────────────────────────
 get    │ jupyter_server_config                             │ /usr/local/etc/jupyter            │           │                                                            
────────┼───────────────────────────────────────────────────┼───────────────────────────────────┼───────────┼────────────────────────────────────────────────────────────
 got    │ jupyter_server_config                             │ /usr/local/etc/jupyter            │           │ {}                                                         
────────┼───────────────────────────────────────────────────┼───────────────────────────────────┼───────────┼────────────────────────────────────────────────────────────
 get    │ jupyter_server_config                             │ $SYS_PREFIX/etc/jupyter           │           │                                                            
────────┼───────────────────────────────────────────────────┼───────────────────────────────────┼───────────┼────────────────────────────────────────────────────────────
 open   │ $SYS_PREFIX/etc/jupyter/jupyter_server_config.d   │ jupyterlab.json                   │           │                                                            
────────┼───────────────────────────────────────────────────┼───────────────────────────────────┼───────────┼────────────────────────────────────────────────────────────
 open   │ $SYS_PREFIX/etc/jupyter/jupyter_server_config.d   │ nbclassic.json                    │           │                                                            
────────┼───────────────────────────────────────────────────┼───────────────────────────────────┼───────────┼────────────────────────────────────────────────────────────
 open   │ $SYS_PREFIX/etc/jupyter/jupyter_server_config.d   │ voila.json                        │           │                                                            
────────┼───────────────────────────────────────────────────┼───────────────────────────────────┼───────────┼────────────────────────────────────────────────────────────
 got    │ jupyter_server_config                             │ $SYS_PREFIX/etc/jupyter           │           │ {                                                          
        │                                                   │                                   │           │   "ServerApp": {                                           
        │                                                   │                                   │           │     "jpserver_extensions": {                               
        │                                                   │                                   │           │       "jupyterlab": true,                                  
        │                                                   │                                   │           │       "nbclassic": true,                                   
        │                                                   │                                   │           │       "voila.server_extension": true                       
        │                                                   │                                   │           │     }                                                      
        │                                                   │                                   │           │   }                                                        
        │                                                   │                                   │           │ }                                                          
────────┼───────────────────────────────────────────────────┼───────────────────────────────────┼───────────┼────────────────────────────────────────────────────────────
 get    │ jupyter_server_config                             │ $HOME/.jupyter                    │           │                                                            
────────┼───────────────────────────────────────────────────┼───────────────────────────────────┼───────────┼────────────────────────────────────────────────────────────
 got    │ jupyter_server_config                             │ $HOME/.jupyter                    │           │ {}                                                         
────────┼───────────────────────────────────────────────────┼───────────────────────────────────┼───────────┼────────────────────────────────────────────────────────────
 get    │ page_config                                       │ /etc/jupyter/labconfig            │           │                                                            
────────┼───────────────────────────────────────────────────┼───────────────────────────────────┼───────────┼────────────────────────────────────────────────────────────
 got    │ page_config                                       │ /etc/jupyter/labconfig            │           │ {}                                                         
────────┼───────────────────────────────────────────────────┼───────────────────────────────────┼───────────┼────────────────────────────────────────────────────────────
 get    │ page_config                                       │ /usr/local/etc/jupyter/labconfig  │           │                                                            
────────┼───────────────────────────────────────────────────┼───────────────────────────────────┼───────────┼────────────────────────────────────────────────────────────
 got    │ page_config                                       │ /usr/local/etc/jupyter/labconfig  │           │ {}                                                         
────────┼───────────────────────────────────────────────────┼───────────────────────────────────┼───────────┼────────────────────────────────────────────────────────────
 get    │ page_config                                       │ $SYS_PREFIX/etc/jupyter/labconfig │           │                                                            
────────┼───────────────────────────────────────────────────┼───────────────────────────────────┼───────────┼────────────────────────────────────────────────────────────
 got    │ page_config                                       │ $SYS_PREFIX/etc/jupyter/labconfig │           │ {}                                                         
────────┼───────────────────────────────────────────────────┼───────────────────────────────────┼───────────┼────────────────────────────────────────────────────────────
 get    │ page_config                                       │ $HOME/.jupyter/labconfig          │           │                                                            
────────┼───────────────────────────────────────────────────┼───────────────────────────────────┼───────────┼────────────────────────────────────────────────────────────
 got    │ page_config                                       │ $HOME/.jupyter/labconfig          │           │ {}                                                         
────────┼───────────────────────────────────────────────────┼───────────────────────────────────┼───────────┼────────────────────────────────────────────────────────────
 get    │ jupyter_notebook_config                           │ $HOME/.jupyter                    │           │                                                            
────────┼───────────────────────────────────────────────────┼───────────────────────────────────┼───────────┼────────────────────────────────────────────────────────────
 got    │ jupyter_notebook_config                           │ $HOME/.jupyter                    │           │ {}                                                         
────────┼───────────────────────────────────────────────────┼───────────────────────────────────┼───────────┼────────────────────────────────────────────────────────────
 get    │ jupyter_notebook_config                           │ /etc/jupyter                      │           │                                                            
────────┼───────────────────────────────────────────────────┼───────────────────────────────────┼───────────┼────────────────────────────────────────────────────────────
 got    │ jupyter_notebook_config                           │ /etc/jupyter                      │           │ {}                                                         
────────┼───────────────────────────────────────────────────┼───────────────────────────────────┼───────────┼────────────────────────────────────────────────────────────
 get    │ jupyter_notebook_config                           │ /usr/local/etc/jupyter            │           │                                                            
────────┼───────────────────────────────────────────────────┼───────────────────────────────────┼───────────┼────────────────────────────────────────────────────────────
 got    │ jupyter_notebook_config                           │ /usr/local/etc/jupyter            │           │ {}                                                         
────────┼───────────────────────────────────────────────────┼───────────────────────────────────┼───────────┼────────────────────────────────────────────────────────────
 get    │ jupyter_notebook_config                           │ $SYS_PREFIX/etc/jupyter           │           │                                                            
────────┼───────────────────────────────────────────────────┼───────────────────────────────────┼───────────┼────────────────────────────────────────────────────────────
 open   │ $SYS_PREFIX/etc/jupyter/jupyter_notebook_config.d │ jupyterlab.json                   │           │                                                            
────────┼───────────────────────────────────────────────────┼───────────────────────────────────┼───────────┼────────────────────────────────────────────────────────────
 open   │ $SYS_PREFIX/etc/jupyter/jupyter_notebook_config.d │ voila.json                        │           │                                                            
────────┼───────────────────────────────────────────────────┼───────────────────────────────────┼───────────┼────────────────────────────────────────────────────────────
 got    │ jupyter_notebook_config                           │ $SYS_PREFIX/etc/jupyter           │           │ {                                                          
        │                                                   │                                   │           │   "NotebookApp": {                                         
        │                                                   │                                   │           │     "nbserver_extensions": {                               
        │                                                   │                                   │           │       "jupyterlab": true,                                  
        │                                                   │                                   │           │       "voila.server_extension": true                       
        │                                                   │                                   │           │     }                                                      
        │                                                   │                                   │           │   }                                                        
        │                                                   │                                   │           │ }                                                          
────────┼───────────────────────────────────────────────────┼───────────────────────────────────┼───────────┼────────────────────────────────────────────────────────────
 get    │ jupyter_notebook_config                           │ $HOME/.jupyter                    │           │                                                            
────────┼───────────────────────────────────────────────────┼───────────────────────────────────┼───────────┼────────────────────────────────────────────────────────────
 got    │ jupyter_notebook_config                           │ $HOME/.jupyter                    │           │ {}                                                         
────────┼───────────────────────────────────────────────────┼───────────────────────────────────┼───────────┼────────────────────────────────────────────────────────────
 change │ kernel_spec_manager                               │ ServerApp                         │           │                                             
────────┼───────────────────────────────────────────────────┼───────────────────────────────────┼───────────┼────────────────────────────────────────────────────────────
 change │ ssl_options                                       │ ServerApp                         │ {}        │                                                            
────────┼───────────────────────────────────────────────────┼───────────────────────────────────┼───────────┼────────────────────────────────────────────────────────────
 stage  │                                                   │                                   │           │ after-init                                                 
────────┼───────────────────────────────────────────────────┼───────────────────────────────────┼───────────┼────────────────────────────────────────────────────────────
 stage  │                                                   │                                   │           │ started                                                    
────────┼───────────────────────────────────────────────────┼───────────────────────────────────┼───────────┼────────────────────────────────────────────────────────────
 stage  │                                                   │                                   │           │ done

meeseeksmachine · 2021-02-09T01:41:56Z

This pull request has been mentioned on Jupyter Community Forum. There might be relevant details there:

https://discourse.jupyter.org/t/how-do-we-uninstall-extensions-that-have-been-installed-using-jupyter-labextension-develop-overwrite/7845/5

Since entry points come from packages installed in the environment, I think it makes sense that they are treated like the environment paths

jasongrout · 2021-03-04T06:20:18Z

@bollwyvl - I made a PR to your PR with a few changes I thought would be good: bollwyvl#1. What do you think?

Entry point paths treated like environment paths

…nt-demo

bollwyvl · 2021-03-10T01:16:19Z

importable python attributes
...
if the module to be imported is an empty file

I'll see if i can get that together. It doesn't add any non-stdlib dependencies, and it gives us some wiggle room for the future. One issue with having dotted notation to the left of the : is that the top-level module will get imported:

If name is for a submodule (contains a dot), the parent module is automatically imported.

So I don't know yet how we might avoid the import behavior... i suppose tossing a warning might help folk back onto the happy path of, my_top_level_name:STATICALLY_PARSEABLE_STRING, but there's nothing for it with e.g. namespace modules.

Test to see how this scales

That'll be more fun 😝

bollwyvl · 2021-03-10T02:45:54Z

With 1000 packages (so 2000 entry_points):

220.28ms jupyter_config_path loaded
7.39ms	jupyter_config_path	fake-mod-999
...
5877.88ms jupyter_config_path	TOTAL

251.32ms jupyter_data_path loaded
...
0.22ms	jupyter_data_path	fake-mod-0
6437.48ms jupyter_data_path	TOTAL

10.38user 2.12system 0:12.53elapsed 99%CPU (0avgtext+0avgdata 16536maxresident)k

bollwyvl · 2021-03-10T02:59:53Z

Starting lab:

jupyter lab --no-browser --debug
# a delay
[D 2021-03-09 21:57:16.340 ServerApp] Searching 
[D 2021-03-09 21:58:16.178 ServerApp] 200 GET /api/contents?content=1&1615345084409 (127.0.0.1) 2.36ms

a minute to first pixels isn't too pretty 😢

bollwyvl · 2021-03-10T03:15:36Z

throwing in a little bit of cache helps immeasurably... well, measurably... but i haven't measured it.

def _entry_point_paths(ep_group):
    return _cached_entry_point_paths(ep_group, math.floor(time.time() / 100))

@functools.lru_cache(maxsize=10)
def _cached_entry_point_paths(ep_group, epoch):
    ...

jasongrout · 2021-03-10T06:19:42Z

a minute to first pixels isn't too pretty 😢

Ouch. I suppose it does have to open lots of files, which is going to be an even bigger pain on NFS and slower filesystems.

jasongrout · 2021-03-10T06:20:40Z

For completeness in documenting discussions in Jupyter around entry points, see also jupyter/notebook#2894.

jasongrout · 2021-03-11T16:39:13Z

Unfortunately, as far as I can tell, conda does not support general entry points, just console_script entry points: conda/conda#9951 (see also https://docs.conda.io/projects/conda-build/en/latest/resources/define-metadata.html#python-entry-points). I think this means many of the packages in our ecosystem would have problems using entry points, as they provide official conda-forge builds as well.

bollwyvl · 2021-03-11T17:37:09Z

conda does not support general entry points

Put back your pitchforks, No worries here! A number of ecosystems (like pytest) would fall entirely apart. The reason conda-build handles them explicitly is the default wrappers generated for console_scripts are usually broken. So if your package isn't adding a jupyter <mycmd> or something, it wouldn't even have that custom entry under build.

bollwyvl · 2021-03-11T17:37:40Z

Also: tried the PEP 420 namespace package thing... might be a non-starter as totally unsurprisingly the files wouldn't be in place with a pth file (such as on windows) with an editable install. I think won't ever work for flit and won't work well on windows are two pretty damning findings.

jasongrout · 2021-03-11T18:01:10Z

Put back your pitchforks, No worries here!

I was just testing things to see if what conda recipes call "entry points" are in reality just "console_script entry points", and if it just left all other entry points alone that were already in the dist_info directories.

...and yes, installing a conda environment with jupyterlab nbconvert vaex and using importlib_metadata reveals lots of entry points.

Pitchfork being sheathed :).

jasongrout · 2021-03-12T07:07:29Z

FYI @bollwyvl, it looks like to me that if you have many entry points with the same name, the entrypoints package gives you back just one, whereas importlib_metadata gives you back all of them.

Edit: oh, never mind, you just have to use the get_group_all instead of get_group_named.

jasongrout · 2021-03-12T08:19:36Z

Here are my timings for jupyter --paths (macOS Catalina, 2015 15" macbook pro) after following these instructions from my branch for installing 1000 packages with entry points (so 2000 total entry points): https://github.com/jasongrout/jupyter_core/tree/0310f4a199ba7da60abc54bd9115f7da9a9cec25/examples/scale

Using the entrypoints package:

% jupyter --paths > /dev/null
261.42ms jupyter_config_paths loaded
745.81ms jupyter_config_paths	TOTAL
265.96ms jupyter_data_paths loaded
403.42ms jupyter_data_paths	TOTAL

Using importlib_metadata to get the entry points

% JUPYTER_ENTRY_POINT_IMPORTLIB=1 jupyter --paths > /dev/null
427.97ms jupyter_config_paths loaded
944.01ms jupyter_config_paths	TOTAL
417.37ms jupyter_data_paths loaded
561.50ms jupyter_data_paths	TOTAL

Also, it seems that JupyterLab is slowed down by about a second if the entrypoint paths are cached:

@functools.lru_cache(maxsize=10)
def _entry_point_paths(ep_group):

jaraco · 2021-03-13T16:18:07Z

Because we're catering to other languages, with e.g. jupyter --paths --json, we need these directories to exist after the python process sys.exits.

I should perhaps clarify this in the importlib.resources docs. The access to resources on the file system is meant to be for the duration of the context manager and that any expectation of use outside of that should be implemented downstream. In other words, if having a copy after the interpreter exits is a goal, I'd recommend to build a routine that manages that lifecycle and copies the content to the more permanent location. The Python import system has little control over the state of the system between interpreter runs (including pip uninstalls) and there's no proposed spec that I'm aware that would enable management of resources across runs.

And it sounds like unless we:
* require an as-yet-unreleased new dependency

* implement our own fuse-like filesystem aware of zipballs, e.g. `jupyter_cored`
... the existing importlib_* stuff isn't going to move us towards the goal of simplifying packaging static data assets and cross-language config files.

It does seem like importlib.resources doesn't implement a solution for this use-case. I should point out that importlib.resources supports much more than just the usual FileLoader and ZipLoader, but provides a protocol for other custom loaders (imagine loading modules from a database or from an RPC) to provide resources. Jupyter may not want to support those cases, but if it did, it would need to honor the interface presented by importlib.resources.

I think an entrypoint is... whatever the definer of the entrypoint says it is?

There is a definition for entry points and that definition does state that the value should be an importable module and optional name inside that module :/.

It does feel like mild abuse to violate this stated intention.

If there were a clear and obvious way for a package to expose another form of arbitrary metadata, that would be my recommendation, but I'm not sure if such an approach is readily feasible in the current metadata design, as I've not seen it before.

But I just tested it, and I think this could work. Instead of using entry_points against their design, provide your own metadata file for your hooks. In jaraco/develop@demo-metadata-writer, I've added a egg_info.writer to that project, allowing that project to act as a plugin for setuptools and write additional metadata for any project that includes it during the build.

Then, in the irc project (chosen arbitrarily), I include that as a build dependency and demo how the metadata is in fact reachable at run time:

irc main $ git diff
diff --git a/pyproject.toml b/pyproject.toml
index b6ebc0b..e6e81dd 100644
--- a/pyproject.toml
+++ b/pyproject.toml
@@ -1,5 +1,5 @@
 [build-system]
-requires = ["setuptools>=42", "wheel", "setuptools_scm[toml]>=3.4.1"]
+requires = ["setuptools>=42", "wheel", "setuptools_scm[toml]>=3.4.1", "jaraco.develop@git+https://github.com/jaraco/jaraco.develop@demo-metadata-writer"]
 build-backend = "setuptools.build_meta"
 
 [tool.black]

irc main $ pip-run -q .
Python 3.9.2 (v3.9.2:1a79785e3e, Feb 19 2021, 09:06:10) 
[Clang 6.0 (clang-600.0.57)] on darwin
Type "help", "copyright", "credits" or "license" for more information.

>>> import importlib.metadata as md
>>> md.distribution('irc').read_text('jupyter hooks.txt')
'hook is at foo/bar.py'

You'd still need a way to solicit the exact hooks for each project. I'd recommend soliciting the hooks from a pyproject.toml, something like

[tool.jupyter.hooks]
...

In this way, you're following the same principles as setuptools uses to solicit and expose entry points, but you're defining a custom format for a distinct purpose. You would have to design and implement the syntax for the file and parse it yourselves, but you probably want that anyway. The advantage is you have imminent control over the syntax and experience and you're still using the same metadata mechanism as entry points and other packaging patterns.

I'd be willing to help guide this implementation if it sounds attractive.

@jaraco is this a regression 🤔

Yes, and an intended one with importlib_metadata 3.5. Essentially, in order to deduplicate distributions correctly, the metadata for each distribution needs to be loaded. There are plans in python/importlib_metadata#283 to improve performance in light of that concern.

jasongrout · 2021-03-16T03:53:14Z

I should perhaps clarify this in the importlib.resources docs. The access to resources on the file system is meant to be for the duration of the context manager and that any expectation of use outside of that should be implemented downstream. In other words, if having a copy after the interpreter exits is a goal, I'd recommend to build a routine that manages that lifecycle and copies the content to the more permanent location. The Python import system has little control over the state of the system between interpreter runs (including pip uninstalls) and there's no proposed spec that I'm aware that would enable management of resources across runs.

Thanks for weighing in on this. Interestingly, one of the primary reasons for us to move to entry points over using data_files is that Python will manage the lifecycle of these files. Perhaps we're chasing a pipe dream if we need to build something generic enough to support any way a python module might be loaded, but also need the resources to be available outside of Python.

If there were a clear and obvious way for a package to expose another form of arbitrary metadata, that would be my recommendation, but I'm not sure if such an approach is readily feasible in the current metadata design, as I've not seen it before.

Nice, thanks! This looks like the approach I was attempting in the "Alternative Solutions" section in the issue description, in commit jasongrout@66351b0 (however, I was really fumbling to get the metadata out of the distributions, and I'm sure I made some inaccurate assumptions involving top_level.txt, for example). We decided to abandon this approach in favor of entry_points since various packagers like poetry, flit, etc., don't seem to support arbitrary metadata files, and having broad packager support was one of our design goals.

By the way, I've been thinking over the past few days about how to make finding a specific group of entry points potentially faster (I haven't benchmarked any experiments, so of course this should be treated with appropriate skepticism). It seems that getting a specific group of entry points requires reading in and parsing all entry point metadata files in the entire python installation, then filtering for the group I want. My hypothesis is that checking if a file exists is much faster than opening and parsing a file. If each group of entry points was stored in a separate file inside the dist_info/egg directory (for example, as files named by the group in a new entry_points directory), it may be much faster to scan for and parse just the data corresponding to a specific entry point group.

gaborbernat · 2021-03-16T08:20:09Z

Worth a try, but my guess is that the largest proportion of the slowness comes from the disk list operation 🤔 but would be great to see some benchmark numbers on discovery vs parsing overhead.

bollwyvl · 2021-03-24T23:56:26Z

Had some other thoughts about our scale issue. And, for reference, a quick look revealed that we are talking about a rough venn diagram of:

5,032 candidate packages on pypi that mention jupyter
833 candidate npm packages that may already be in pypi, or will be

so these scale concerns are not entirely academic bikeshedding.

Regarding benchmarking: yeah, the above were all with entrypoints vs my_module:foo and not even checking (much less importing) what foo was (assuming it to be my_module/foo). I didn't even try the exercise with importlib*. Things were slightly better, once there was some caching in place, but outside of just the startup issues, a number of other parts of the stack, e.g. jinja2.FileSystemLoader and tornado.web.StaticFileHandler were never designed to be used with such large search paths.

meeseeksmachine · 2021-04-28T19:45:08Z

This pull request has been mentioned on Jupyter Community Forum. There might be relevant details there:

https://discourse.jupyter.org/t/how-could-data-files-be-improved/8972/2

telamonian · 2021-05-01T20:56:00Z

jupyter_core/paths.py

+    """
+    spec = importlib.util.find_spec(ep.module_name)
+    module = importlib.util.module_from_spec(spec)
+    origin = pathlib.Path(module.__file__).parent.resolve()


module.__file__ is None if there's no top-level __init__.py file in the module

telamonian · 2021-05-05T16:08:27Z

I brought this up in the jlab dev call today: https://hackmd.io/Y7fBMQPSQ1C08SDGI-fwtg?both#5-May-2021-Weekly-Meeting

@bollwyvl @jasongrout What's the status of the work on entry_points? After reading through this issue and a lot of related stuff, my sense is that no matter what we do, the fact that implementing jupyter entry_points requires checking 1000s of paths for plugins/config at runtime is going to cause problems we currently don't have

meeseeksmachine · 2021-11-18T00:49:47Z

This pull request has been mentioned on Jupyter Community Forum. There might be relevant details there:

https://discourse.jupyter.org/t/package-managers-extension-paths/11723/2

blink1073 · 2022-10-28T19:59:20Z

Closing in favor of using shared-data in hatch. Thanks @bollwyvl and all for pushing on this front!

add data/config path entry_points, minimal example

1b133b5

bollwyvl mentioned this pull request Nov 21, 2020

RFP for successor to data_files-based extension discovery? jupyter-server/jupyter_server#351

Closed

remove JUPYTER_PATH_ENTRY_POINT from test, since not testing yet

929726b

jasongrout added 2 commits March 3, 2021 22:15

Make entry point paths come just after the environment-level paths.

8a110ef

Since entry points come from packages installed in the environment, I think it makes sense that they are treated like the environment paths

Add an example of a data file entry point.

7e17498

bollwyvl added 18 commits March 4, 2021 08:29

Merge pull request #1 from jasongrout/entry-point-demo

357a9b4

Entry point paths treated like environment paths

Merge remote-tracking branch 'upstream/master' into entry-point-demo

5f6bf67

Merge remote-tracking branch 'origin/entry-point-demo' into entry-poi…

ac3d218

…nt-demo

add flit example

1913537

flatten flit example

b88bb49

try more flit conf

cfe2aa1

try more flit conf

709ddf1

more clean up

f8fded8

clean up docs

b26db46

more docs changelog

308086c

add tests

5fbe543

flesh out development options

5044183

working on ci

f3c4ee5

clean up ci test

1faf4ab

more work on ci

cd9ab14

just deal with ubunut-latest warn for now

d95e9ee

fix pypy excursion

1293714

fix linux name

ebf260c

blink1073 mentioned this pull request Mar 18, 2021

Jupyter Server Notes 2021 jupyter-server/team-compass#4

Closed

jtpio mentioned this pull request Mar 22, 2021

Weekly Dev Meetings: Jan-Jul 2021 jupyterlab/frontends-team-compass#117

Closed

bollwyvl mentioned this pull request May 1, 2021

Using Poetry for building JupyterLab Prebuilt Extension jupyterlab/jupyterlab#10179

Open

telamonian mentioned this pull request May 1, 2021

Imports in top-level __init__.py cause eager loading of most enterprise_gateway code jupyter-server/enterprise_gateway#960

Closed

telamonian reviewed May 1, 2021

View reviewed changes

bollwyvl mentioned this pull request Aug 2, 2021

JupyterLabs does not find config files when installed in __pypackages__ according to PEP582 (e.g. using pdm) jupyterlab/jupyterlab#10696

Closed

danielfrg mentioned this pull request Sep 23, 2021

Poetry for building danielfrg/jupyter-flex#97

Closed

blink1073 mentioned this pull request Mar 24, 2022

[FR] External Data Support pypa/setuptools#3191

Open

1 task

N-Coder mentioned this pull request Apr 5, 2022

Support for data_files python-poetry/poetry#890

Closed

2 tasks

blink1073 closed this Oct 28, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

data/config path entry_points with minimal examples #209

data/config path entry_points with minimal examples #209

bollwyvl commented Nov 21, 2020 •

edited by jasongrout

Loading

maartenbreddels commented Nov 21, 2020

bollwyvl commented Nov 21, 2020

maartenbreddels commented Nov 22, 2020

bollwyvl commented Nov 22, 2020

bollwyvl commented Nov 22, 2020 •

edited

Loading

meeseeksmachine commented Feb 9, 2021

jasongrout commented Mar 4, 2021

bollwyvl commented Mar 10, 2021 •

edited

Loading

bollwyvl commented Mar 10, 2021

bollwyvl commented Mar 10, 2021

bollwyvl commented Mar 10, 2021

jasongrout commented Mar 10, 2021

jasongrout commented Mar 10, 2021

jasongrout commented Mar 11, 2021

bollwyvl commented Mar 11, 2021

bollwyvl commented Mar 11, 2021 •

edited

Loading

jasongrout commented Mar 11, 2021

jasongrout commented Mar 12, 2021 •

edited

Loading

jasongrout commented Mar 12, 2021 •

edited

Loading

jaraco commented Mar 13, 2021

jasongrout commented Mar 16, 2021

gaborbernat commented Mar 16, 2021

bollwyvl commented Mar 24, 2021

meeseeksmachine commented Apr 28, 2021

telamonian May 1, 2021

telamonian commented May 5, 2021

meeseeksmachine commented Nov 18, 2021

blink1073 commented Oct 28, 2022

	[options.entry_points]
	jupyter_config_paths =
	entry-point-example-setuptools = entry_point_example_setuptools:JUPYTER_CONFIG_PATHS
	jupyter_data_paths =
	entry-point-example-setuptools = entry_point_example_setuptools:JUPYTER_DATA_PATHS

	[tool.flit.entrypoints.jupyter_config_paths]
	entry-point-example-flit = "entry_point_example_flit:JUPYTER_CONFIG_PATHS"

	[tool.flit.entrypoints.jupyter_data_paths]
	entry-point-example-flit = "entry_point_example_flit:JUPYTER_DATA_PATHS"

data/config path entry_points with minimal examples #209

data/config path entry_points with minimal examples #209

Conversation

bollwyvl commented Nov 21, 2020 • edited by jasongrout Loading

Background

Problem

Proposed solution

Problems with the proposed solution

Alternative solutions

Example

Original issue description

maartenbreddels commented Nov 21, 2020

bollwyvl commented Nov 21, 2020

maartenbreddels commented Nov 22, 2020

bollwyvl commented Nov 22, 2020

bollwyvl commented Nov 22, 2020 • edited Loading

meeseeksmachine commented Feb 9, 2021

jasongrout commented Mar 4, 2021

bollwyvl commented Mar 10, 2021 • edited Loading

bollwyvl commented Mar 10, 2021

bollwyvl commented Mar 10, 2021

bollwyvl commented Mar 10, 2021

jasongrout commented Mar 10, 2021

jasongrout commented Mar 10, 2021

jasongrout commented Mar 11, 2021

bollwyvl commented Mar 11, 2021

bollwyvl commented Mar 11, 2021 • edited Loading

jasongrout commented Mar 11, 2021

jasongrout commented Mar 12, 2021 • edited Loading

jasongrout commented Mar 12, 2021 • edited Loading

jaraco commented Mar 13, 2021

jasongrout commented Mar 16, 2021

gaborbernat commented Mar 16, 2021

bollwyvl commented Mar 24, 2021

meeseeksmachine commented Apr 28, 2021

telamonian May 1, 2021

Choose a reason for hiding this comment

telamonian commented May 5, 2021

meeseeksmachine commented Nov 18, 2021

blink1073 commented Oct 28, 2022

bollwyvl commented Nov 21, 2020 •

edited by jasongrout

Loading

bollwyvl commented Nov 22, 2020 •

edited

Loading

bollwyvl commented Mar 10, 2021 •

edited

Loading

bollwyvl commented Mar 11, 2021 •

edited

Loading

jasongrout commented Mar 12, 2021 •

edited

Loading

jasongrout commented Mar 12, 2021 •

edited

Loading