Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Added non-git source puller functionality #194

Open
wants to merge 45 commits into
base: main
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from 35 commits
Commits
Show all changes
45 commits
Select commit Hold shift + click to select a range
ea87f2b
Command-line argument repo_dir is changed
sean-morris Jun 24, 2021
10385bb
Added non-git source puller functionality
sean-morris Jun 24, 2021
ab80daf
Added async functionality to non-git archives
sean-morris Aug 11, 2021
71ca2f4
Update nbgitpuller/plugin_helper.py
sean-morris Nov 3, 2021
ae66e53
Update nbgitpuller/hookspecs.py
sean-morris Nov 3, 2021
8934f5f
renamed and simplified the test_files
sean-morris Nov 4, 2021
ac2072c
added README to plugins
sean-morris Nov 4, 2021
a84096d
added docstring to progress_loop function
sean-morris Nov 4, 2021
86fd7bf
Update tests/test_download_puller.py
sean-morris Nov 4, 2021
c686651
Update tests/test_download_puller.py
sean-morris Nov 4, 2021
f8e04f1
Removed Downloader Plugins from Repo
sean-morris Nov 6, 2021
958b0b1
Added Custom Exception for Bad Provider
sean-morris Nov 6, 2021
2048e8d
Merge branch 'main' of https://github.com/jupyterhub/nbgitpuller
sean-morris Nov 8, 2021
398a03f
merged from master and fixed conflicts
sean-morris Nov 8, 2021
9a8fcab
Removed unused import from test file
sean-morris Nov 8, 2021
78e31c3
Added packages to dev-requirements.txt
sean-morris Nov 8, 2021
a131b93
Moved the two constants and REPO_PARENT_DIR out of __init__.py
sean-morris Nov 10, 2021
55da5e1
Revert some trivial formatting changes
consideRatio Nov 17, 2021
0ca6cf9
Apply suggestions from code review
sean-morris Nov 17, 2021
9e808e5
Changes from code review
sean-morris Nov 17, 2021
8d63ee4
Apply suggestions from code review
sean-morris Nov 19, 2021
deecc7b
Removed setTerminalVisibility from automatically opening in UI
sean-morris Nov 23, 2021
a9e08c4
Reverted a mistaken change to command-line args
sean-morris Nov 23, 2021
09c9249
Hookspecs renamed and documented
sean-morris Nov 23, 2021
0085fab
Hookspecs name and seperate helper_args
sean-morris Nov 23, 2021
88ec806
Renamed for clarity
sean-morris Nov 24, 2021
8592d1f
Seperated actual query_line_args from helper_args
sean-morris Nov 24, 2021
21d8f0f
fixed conflicts
sean-morris Nov 24, 2021
ab5dd10
Fixed tests
sean-morris Nov 24, 2021
e8ae5ca
Removed changes not meant to merged
sean-morris Nov 26, 2021
56ad1ee
Apply suggestions from code review
sean-morris Nov 29, 2021
af567ca
Refactored docstrings
sean-morris Nov 29, 2021
782a35b
Refactored docstrings
sean-morris Nov 29, 2021
d034d37
Merge branch 'non-git' of https://github.com/sean-morris/nbgitpuller …
sean-morris Nov 29, 2021
9729464
Fix temp download dir to use the package tempfile
sean-morris Nov 30, 2021
602ef01
provider is now contentProvider in the html/js/query parameters
sean-morris Nov 30, 2021
3ebdc7e
The download_func and download_func_params brought in separately
sean-morris Nov 30, 2021
e22d076
Moved the handle_files_helper in Class
sean-morris Dec 1, 2021
3b14405
Moved downloader-plugin util to own repo
sean-morris Dec 20, 2021
613f863
Moved downloader-plugin util to own repo
sean-morris Dec 20, 2021
5f39c68
Merge branch 'non-git' of https://github.com/sean-morris/nbgitpuller …
sean-morris Dec 20, 2021
f618560
Removed nested_asyncio from init.py
sean-morris Jan 11, 2022
367f3c7
Moved downloader-plugin handling to puller thread
sean-morris Jan 15, 2022
8893970
Moved downloader plugins handling to pull.py
sean-morris Jan 19, 2022
7590c38
Access downloader-plugin results from plugin instance variable
sean-morris Jan 19, 2022
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
7 changes: 5 additions & 2 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -14,11 +14,14 @@ data8assets/
summer/
test-repo/
venv/
.idea/

.ipynb_checkpoints
docs/_build

jupyterhub.sqlite
jupyterhub_cookie_secret

node_modules/
package-lock.json

nbgitpuller/static/dist
nbgitpuller/static/dist
2 changes: 2 additions & 0 deletions dev-requirements.txt
Original file line number Diff line number Diff line change
Expand Up @@ -3,3 +3,5 @@ pytest
pytest-cov
flake8
nbclassic
aioresponses
pytest-asyncio
6 changes: 6 additions & 0 deletions nbgitpuller/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -4,6 +4,12 @@
from notebook.utils import url_path_join
from tornado.web import StaticFileHandler
import os
import nest_asyncio

# this allows us to nest usage of the event_loop from asyncio
# being used by tornado in jupyter distro
# Ref: https://medium.com/@vyshali.enukonda/how-to-get-around-runtimeerror-this-event-loop-is-already-running-3f26f67e762e
nest_asyncio.apply()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This monkeypatches the asyncio event loop, is this event loop shared across the whole of jupyter-server? If so are there any potential issues that might be caused in jupyter-server, extensions, or kernels that we should be aware of?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@consideRatio @manics Erik do you have any sense of this? I completely based this solution on the reference I linked to in the comments. He seemed to be trying to solve a similar nested event loop issue in Jupyter.

I read a bunch more pieces.

  • Here is a discussion about where the error occurs(Runtime error related to trying to start an already running loop); the ipykernel seems to be the issue. The nested_asyncio solution becomes a part of the thread in August 2020.
  • Here is another from the jupyter blog in 2018; at this point, they don't want to integrate the nested_asyncio package until there is more testing.
  • I also tried upgrading the tornado and ipykernel packages; a random comment seemed to indicate it would solve the problem; after upgrading tornado to version 6.1(no change in my environment) and ipykernel to 6.6.1(up from 6.6.0) -- it didn't work. I still need the nested_asyncio package.

pip3 install --upgrade tornado
pip3 install --upgrade ipykernel

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I moved the nest_asyncio calls to the nbgitpuller-downloader-plugins themselves as well as added notes to the documentation that the import and call to nest_asyncio need to be included in any nbgitpuller-downloader-plugin.



def _jupyter_server_extension_paths():
Expand Down
128 changes: 99 additions & 29 deletions nbgitpuller/handlers.py
Original file line number Diff line number Diff line change
@@ -1,17 +1,27 @@
from tornado import gen, web, locks
import traceback
import urllib.parse

from notebook.base.handlers import IPythonHandler
import threading
import json
import os
sean-morris marked this conversation as resolved.
Show resolved Hide resolved
from queue import Queue, Empty
from queue import Queue
import jinja2

from .pull import GitPuller
from .version import __version__
from . import plugin_hook_specs
import pluggy


class ContentProviderException(Exception):
"""
Custom Exception thrown when the content_provider key specifying
the downloader plugin is not installed or can not be found by the
name given
"""
def __init__(self, response=None):
self.response = response

class SyncHandler(IPythonHandler):
def __init__(self, *args, **kwargs):
Expand All @@ -38,6 +48,54 @@ def emit(self, data):
self.write('data: {}\n\n'.format(serialized_data))
yield self.flush()

def setup_plugins(self, content_provider):
"""
This automatically searches for and loads packages whose entrypoint is nbgitpuller. If found,
the plugin manager object is returned and used to execute the hook implemented by
the plugin.
:param content_provider: this is the name of the content_provider; each plugin is named to identify the
content_provider of the archive to be loaded(e.g. googledrive, dropbox, etc)
:return: returns the PluginManager object used to call the implemented hooks of the plugin
:raises: ContentProviderException -- this occurs when the content_provider parameter is not found
"""
plugin_manager = pluggy.PluginManager("nbgitpuller")
plugin_manager.add_hookspecs(plugin_hook_specs)
num_loaded = plugin_manager.load_setuptools_entrypoints("nbgitpuller", name=content_provider)
if num_loaded == 0:
raise ContentProviderException(f"The content_provider key you supplied in the URL could not be found: {content_provider}")
return plugin_manager

@gen.coroutine
def _wait_for_sync_progress_queue(self, queue):
"""
The loop below constantly checks the queue parameter for messages
that are being sent to the UI so the user is kept aware of progress related to
the downloading of archives and the merging of files into the user's home folder

:param queue: download_queue or the original pull queue
"""
while True:
if queue.empty():
yield gen.sleep(0.5)
continue
progress = queue.get_nowait()
if progress is None:
return
if isinstance(progress, Exception):
self.emit({
'phase': 'error',
'message': str(progress),
'output': '\n'.join([
line.strip()
for line in traceback.format_exception(
type(progress), progress, progress.__traceback__
)
])
})
return

self.emit({'output': progress, 'phase': 'syncing'})

@web.authenticated
@gen.coroutine
def get(self):
Expand All @@ -53,6 +111,7 @@ def get(self):
try:
repo = self.get_argument('repo')
branch = self.get_argument('branch', None)
content_provider = self.get_argument('content_provider', None)
depth = self.get_argument('depth', None)
if depth:
depth = int(depth)
Expand All @@ -73,8 +132,21 @@ def get(self):
self.set_header('content-type', 'text/event-stream')
self.set_header('cache-control', 'no-cache')

gp = GitPuller(repo, repo_dir, branch=branch, depth=depth, parent=self.settings['nbapp'])
# if content_provider is specified then we are dealing with compressed
# archive and not a git repo
if content_provider is not None:
plugin_manager = self.setup_plugins(content_provider)
query_line_args = {k: v[0].decode() for k, v in self.request.arguments.items()}
download_q = Queue()
helper_args = dict()
helper_args["wait_for_sync_progress_queue"] = lambda: self._wait_for_sync_progress_queue(download_q)
helper_args["download_q"] = download_q
helper_args["repo_parent_dir"] = repo_parent_dir
results = plugin_manager.hook.handle_files(helper_args=helper_args,query_line_args=query_line_args)
repo_dir = repo_parent_dir + results["output_dir"]
repo = "file://" + results["origin_repo_path"]

gp = GitPuller(repo, repo_dir, branch=branch, depth=depth, parent=self.settings['nbapp'])
q = Queue()

def pull():
Expand All @@ -87,33 +159,21 @@ def pull():
q.put_nowait(e)
raise e
self.gp_thread = threading.Thread(target=pull)

self.gp_thread.start()

while True:
try:
progress = q.get_nowait()
except Empty:
yield gen.sleep(0.5)
continue
if progress is None:
break
if isinstance(progress, Exception):
self.emit({
'phase': 'error',
'message': str(progress),
'output': '\n'.join([
line.strip()
for line in traceback.format_exception(
type(progress), progress, progress.__traceback__
)
])
})
return

self.emit({'output': progress, 'phase': 'syncing'})

yield self._wait_for_sync_progress_queue(q)
self.emit({'phase': 'finished'})

except ContentProviderException as pe:
self.emit({
'phase': 'error',
'message': str(pe),
'output': '\n'.join([
line.strip()
for line in traceback.format_exception(
type(pe), pe, pe.__traceback__
)
])
})
except Exception as e:
self.emit({
'phase': 'error',
Expand Down Expand Up @@ -151,6 +211,7 @@ def get(self):
repo = self.get_argument('repo')
consideRatio marked this conversation as resolved.
Show resolved Hide resolved
branch = self.get_argument('branch', None)
depth = self.get_argument('depth', None)
content_provider = self.get_argument('content_provider', None)
urlPath = self.get_argument('urlpath', None) or \
self.get_argument('urlPath', None)
subPath = self.get_argument('subpath', None) or \
Expand All @@ -171,10 +232,19 @@ def get(self):
else:
path = 'tree/' + path

if content_provider is not None:
path = "tree/"

self.write(
self.render_template(
'status.html',
repo=repo, branch=branch, path=path, depth=depth, targetpath=targetpath, version=__version__
repo=repo,
branch=branch,
path=path,
depth=depth,
provider=content_provider,
targetpath=targetpath,
version=__version__
))
self.flush()

Expand Down
Loading