Handling large number of files #538

cnydw · 2021-06-08T15:19:29Z

At the moment, when a user opens a folder from notebook or jupyterlab, jupyter_server would read all the files inside the folder using os.lstat, which is very costly for large number of files.

jupyter_server/jupyter_server/services/contents/filemanager.py

Lines 262 to 271 in 51e3ec3

    
           for name in os.listdir(os_dir): 
        
               try: 
        
                   os_path = os.path.join(os_dir, name) 
        
               except UnicodeDecodeError as e: 
        
                   self.log.warning( 
        
                       "failed to decode filename '%s': %s", name, e) 
        
                   continue 
        
               try: 
        
                   st = os.lstat(os_path)

This makes it basically impossible to open a folder with large number of files, the backend would freeze for a long time before being responsive again. And even when the backend returns the data, the frontend would crash due to the rendering of all the files. See jupyterlab/jupyterlab#8700

It would be nice to improve this architecture, using paging or other methods to partially read the files.

The text was updated successfully, but these errors were encountered:

welcome · 2021-06-08T15:19:30Z

Thank you for opening your first issue in this project! Engagement like this is essential for open source projects! 🤗

If you haven't done so already, check out Jupyter's Code of Conduct. Also, please try to follow the issue template as it helps other other community members to contribute more effectively.

You can meet the other Jovyans by joining our Discourse forum. There is also an intro thread there where you can stop by and say Hi! 👋

Welcome to the Jupyter community! 🎉

cnydw · 2021-06-08T15:45:06Z

I created a draft pull request #539

Together with my other commit cnydw/jupyterlab@6e615c0 on the JupyterLab frontend, it could open a folder with 100000 files without problem.

The two commits I made are just POC, the API changes can certainly be improved. I think it makes sense to first make the backend API changes in jupyter_server, then propagate the frontend changes to JupyterLab and Jupyter Notebook accordingly.

@fcollonval @telamonian

kzhang2 · 2022-06-08T19:50:50Z

hi, any updates on getting this merged?

cnydw added the enhancement label Jun 8, 2021

cnydw mentioned this issue Jun 8, 2021

Filebrowser paging draft #539

Draft

mwakaba2 mentioned this issue Aug 2, 2021

Lab hangs when try to open a directory contains too many files jupyterlab/jupyterlab#8230

Closed

krassowski mentioned this issue Sep 5, 2022

Support pagination #962

Open

Zsailer mentioned this issue Sep 8, 2022

Meeting Notes 2022 jupyter-server/team-compass#15

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Handling large number of files #538

Handling large number of files #538

cnydw commented Jun 8, 2021 •

edited

Loading

welcome bot commented Jun 8, 2021

cnydw commented Jun 8, 2021

kzhang2 commented Jun 8, 2022

Handling large number of files #538

Handling large number of files #538

Comments

cnydw commented Jun 8, 2021 • edited Loading

welcome bot commented Jun 8, 2021

cnydw commented Jun 8, 2021

kzhang2 commented Jun 8, 2022

cnydw commented Jun 8, 2021 •

edited

Loading