Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Added non-git source puller functionality #194

Open
wants to merge 45 commits into
base: main
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from 2 commits
Commits
Show all changes
45 commits
Select commit Hold shift + click to select a range
ea87f2b
Command-line argument repo_dir is changed
sean-morris Jun 24, 2021
10385bb
Added non-git source puller functionality
sean-morris Jun 24, 2021
ab80daf
Added async functionality to non-git archives
sean-morris Aug 11, 2021
71ca2f4
Update nbgitpuller/plugin_helper.py
sean-morris Nov 3, 2021
ae66e53
Update nbgitpuller/hookspecs.py
sean-morris Nov 3, 2021
8934f5f
renamed and simplified the test_files
sean-morris Nov 4, 2021
ac2072c
added README to plugins
sean-morris Nov 4, 2021
a84096d
added docstring to progress_loop function
sean-morris Nov 4, 2021
86fd7bf
Update tests/test_download_puller.py
sean-morris Nov 4, 2021
c686651
Update tests/test_download_puller.py
sean-morris Nov 4, 2021
f8e04f1
Removed Downloader Plugins from Repo
sean-morris Nov 6, 2021
958b0b1
Added Custom Exception for Bad Provider
sean-morris Nov 6, 2021
2048e8d
Merge branch 'main' of https://github.com/jupyterhub/nbgitpuller
sean-morris Nov 8, 2021
398a03f
merged from master and fixed conflicts
sean-morris Nov 8, 2021
9a8fcab
Removed unused import from test file
sean-morris Nov 8, 2021
78e31c3
Added packages to dev-requirements.txt
sean-morris Nov 8, 2021
a131b93
Moved the two constants and REPO_PARENT_DIR out of __init__.py
sean-morris Nov 10, 2021
55da5e1
Revert some trivial formatting changes
consideRatio Nov 17, 2021
0ca6cf9
Apply suggestions from code review
sean-morris Nov 17, 2021
9e808e5
Changes from code review
sean-morris Nov 17, 2021
8d63ee4
Apply suggestions from code review
sean-morris Nov 19, 2021
deecc7b
Removed setTerminalVisibility from automatically opening in UI
sean-morris Nov 23, 2021
a9e08c4
Reverted a mistaken change to command-line args
sean-morris Nov 23, 2021
09c9249
Hookspecs renamed and documented
sean-morris Nov 23, 2021
0085fab
Hookspecs name and seperate helper_args
sean-morris Nov 23, 2021
88ec806
Renamed for clarity
sean-morris Nov 24, 2021
8592d1f
Seperated actual query_line_args from helper_args
sean-morris Nov 24, 2021
21d8f0f
fixed conflicts
sean-morris Nov 24, 2021
ab5dd10
Fixed tests
sean-morris Nov 24, 2021
e8ae5ca
Removed changes not meant to merged
sean-morris Nov 26, 2021
56ad1ee
Apply suggestions from code review
sean-morris Nov 29, 2021
af567ca
Refactored docstrings
sean-morris Nov 29, 2021
782a35b
Refactored docstrings
sean-morris Nov 29, 2021
d034d37
Merge branch 'non-git' of https://github.com/sean-morris/nbgitpuller …
sean-morris Nov 29, 2021
9729464
Fix temp download dir to use the package tempfile
sean-morris Nov 30, 2021
602ef01
provider is now contentProvider in the html/js/query parameters
sean-morris Nov 30, 2021
3ebdc7e
The download_func and download_func_params brought in separately
sean-morris Nov 30, 2021
e22d076
Moved the handle_files_helper in Class
sean-morris Dec 1, 2021
3b14405
Moved downloader-plugin util to own repo
sean-morris Dec 20, 2021
613f863
Moved downloader-plugin util to own repo
sean-morris Dec 20, 2021
5f39c68
Merge branch 'non-git' of https://github.com/sean-morris/nbgitpuller …
sean-morris Dec 20, 2021
f618560
Removed nested_asyncio from init.py
sean-morris Jan 11, 2022
367f3c7
Moved downloader-plugin handling to puller thread
sean-morris Jan 15, 2022
8893970
Moved downloader plugins handling to pull.py
sean-morris Jan 19, 2022
7590c38
Access downloader-plugin results from plugin instance variable
sean-morris Jan 19, 2022
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
10 changes: 10 additions & 0 deletions nbgitpuller/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -4,6 +4,16 @@
from notebook.utils import url_path_join
from tornado.web import StaticFileHandler
import os
import nest_asyncio

REPO_PARENT_DIR = None
consideRatio marked this conversation as resolved.
Show resolved Hide resolved
TEMP_DOWNLOAD_REPO_DIR = "/tmp/temp_download_repo"
CACHED_ORIGIN_NON_GIT_REPO = ".nbgitpuller/targets/"

# this allows us to nest usage of the event_loop from asyncio
# being used by tornado in jupyter distro
# Ref: https://medium.com/@vyshali.enukonda/how-to-get-around-runtimeerror-this-event-loop-is-already-running-3f26f67e762e
nest_asyncio.apply()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This monkeypatches the asyncio event loop, is this event loop shared across the whole of jupyter-server? If so are there any potential issues that might be caused in jupyter-server, extensions, or kernels that we should be aware of?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@consideRatio @manics Erik do you have any sense of this? I completely based this solution on the reference I linked to in the comments. He seemed to be trying to solve a similar nested event loop issue in Jupyter.

I read a bunch more pieces.

  • Here is a discussion about where the error occurs(Runtime error related to trying to start an already running loop); the ipykernel seems to be the issue. The nested_asyncio solution becomes a part of the thread in August 2020.
  • Here is another from the jupyter blog in 2018; at this point, they don't want to integrate the nested_asyncio package until there is more testing.
  • I also tried upgrading the tornado and ipykernel packages; a random comment seemed to indicate it would solve the problem; after upgrading tornado to version 6.1(no change in my environment) and ipykernel to 6.6.1(up from 6.6.0) -- it didn't work. I still need the nested_asyncio package.

pip3 install --upgrade tornado
pip3 install --upgrade ipykernel

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I moved the nest_asyncio calls to the nbgitpuller-downloader-plugins themselves as well as added notes to the documentation that the import and call to nest_asyncio need to be included in any nbgitpuller-downloader-plugin.



def _jupyter_server_extension_paths():
Expand Down
114 changes: 79 additions & 35 deletions nbgitpuller/handlers.py
Original file line number Diff line number Diff line change
@@ -1,7 +1,6 @@
from tornado import gen, web, locks
import traceback
import urllib.parse

from notebook.base.handlers import IPythonHandler
import threading
import json
Expand All @@ -11,6 +10,9 @@

from .pull import GitPuller
from .version import __version__
from . import hookspecs
import pluggy
import nbgitpuller


class SyncHandler(IPythonHandler):
Expand Down Expand Up @@ -38,6 +40,38 @@ def emit(self, data):
self.write('data: {}\n\n'.format(serialized_data))
yield self.flush()

def setup_plugins(self, provider):
sean-morris marked this conversation as resolved.
Show resolved Hide resolved
pm = pluggy.PluginManager("nbgitpuller")
pm.add_hookspecs(hookspecs)
pm.load_setuptools_entrypoints("nbgitpuller", name=provider)
return pm

@gen.coroutine
def progress_loop(self, queue):
sean-morris marked this conversation as resolved.
Show resolved Hide resolved
sean-morris marked this conversation as resolved.
Show resolved Hide resolved
while True:
try:
progress = queue.get_nowait()
except Empty:
yield gen.sleep(0.1)
sean-morris marked this conversation as resolved.
Show resolved Hide resolved
continue
sean-morris marked this conversation as resolved.
Show resolved Hide resolved
if progress is None:
yield gen.sleep(5)
sean-morris marked this conversation as resolved.
Show resolved Hide resolved
consideRatio marked this conversation as resolved.
Show resolved Hide resolved
return
if isinstance(progress, Exception):
self.emit({
'phase': 'error',
'message': str(progress),
'output': '\n'.join([
line.strip()
for line in traceback.format_exception(
type(progress), progress, progress.__traceback__
)
])
})
return

self.emit({'output': progress, 'phase': 'syncing'})

@web.authenticated
@gen.coroutine
def get(self):
Expand All @@ -53,6 +87,7 @@ def get(self):
try:
repo = self.get_argument('repo')
branch = self.get_argument('branch', None)
provider = self.get_argument('provider', None)
depth = self.get_argument('depth', None)
if depth:
depth = int(depth)
Expand All @@ -65,16 +100,31 @@ def get(self):
# so that all repos are always in scope after cloning. Sometimes
# server_root_dir will include things like `~` and so the path
# must be expanded.
repo_parent_dir = os.path.join(os.path.expanduser(self.settings['server_root_dir']),
os.getenv('NBGITPULLER_PARENTPATH', ''))
repo_dir = os.path.join(repo_parent_dir, self.get_argument('targetpath', repo.split('/')[-1]))
repo_parent_dir = os.path.join(os.path.expanduser(self.settings['server_root_dir']), os.getenv('NBGITPULLER_PARENTPATH', ''))
nbgitpuller.REPO_PARENT_DIR = repo_parent_dir

repo_dir = os.path.join(
repo_parent_dir,
self.get_argument('targetpath', repo.split('/')[-1]))
sean-morris marked this conversation as resolved.
Show resolved Hide resolved

# We gonna send out event streams!
self.set_header('content-type', 'text/event-stream')
self.set_header('cache-control', 'no-cache')

gp = GitPuller(repo, repo_dir, branch=branch, depth=depth, parent=self.settings['nbapp'])
# if provider is specified then we are dealing with compressed
# archive and not a git repo
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a great and important comment, because "provider" isn't by itself obvious. I think it would be nice to have this named "content provider" or similar, where "git" is the default if unset for example. If we don't rename provider to something more explicit like content provider, I think we should at least write our documentation surrounding it as if it were a content provider that by default is "git" - but plugins can make us support non-git content providers.

In practice I figure what we do in the code below, is that when we have a non-git content provider, we let it do its custom logic over unknown time while reporting its progress via the "progress_loop" function.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I made this change across the board. The only issue was that I had trouble rendering the template for status.html if I included an underscore in the parameter. I couldn't seem to find any documentation on this. I will look more but for now the template rendering of status.html says provider but everywhere else is content_provider

if provider is not None:
pm = self.setup_plugins(provider)
req_args = {k: v[0].decode() for k, v in self.request.arguments.items()}
download_q = Queue()
req_args["progress_func"] = lambda: self.progress_loop(download_q)
req_args["download_q"] = download_q
hf_args = {"query_line_args": req_args}
results = pm.hook.handle_files(**hf_args)
sean-morris marked this conversation as resolved.
Show resolved Hide resolved
repo_dir = repo_parent_dir + results["unzip_dir"]
repo = "file://" + results["origin_repo_path"]

gp = GitPuller(repo, repo_dir, branch=branch, depth=depth, parent=self.settings['nbapp'])
q = Queue()

def pull():
Expand All @@ -87,33 +137,11 @@ def pull():
q.put_nowait(e)
raise e
self.gp_thread = threading.Thread(target=pull)

self.gp_thread.start()

while True:
try:
progress = q.get_nowait()
except Empty:
yield gen.sleep(0.5)
continue
if progress is None:
break
if isinstance(progress, Exception):
self.emit({
'phase': 'error',
'message': str(progress),
'output': '\n'.join([
line.strip()
for line in traceback.format_exception(
type(progress), progress, progress.__traceback__
)
])
})
return

self.emit({'output': progress, 'phase': 'syncing'})

self.progress_loop(q)
sean-morris marked this conversation as resolved.
Show resolved Hide resolved
yield gen.sleep(3)
sean-morris marked this conversation as resolved.
Show resolved Hide resolved
self.emit({'phase': 'finished'})

except Exception as e:
self.emit({
'phase': 'error',
Expand Down Expand Up @@ -147,18 +175,18 @@ def initialize(self):
@gen.coroutine
def get(self):
app_env = os.getenv('NBGITPULLER_APP', default='notebook')

repo = self.get_argument('repo')
consideRatio marked this conversation as resolved.
Show resolved Hide resolved
branch = self.get_argument('branch', None)
depth = self.get_argument('depth', None)
provider = self.get_argument('provider', None)
urlPath = self.get_argument('urlpath', None) or \
self.get_argument('urlPath', None)
self.get_argument('urlPath', None)
consideRatio marked this conversation as resolved.
Show resolved Hide resolved
subPath = self.get_argument('subpath', None) or \
self.get_argument('subPath', '.')
self.get_argument('subPath', '.')
consideRatio marked this conversation as resolved.
Show resolved Hide resolved
app = self.get_argument('app', app_env)
parent_reldir = os.getenv('NBGITPULLER_PARENTPATH', '')
targetpath = self.get_argument('targetpath', None) or \
self.get_argument('targetPath', repo.split('/')[-1])
self.get_argument('targetPath', repo.split('/')[-1])
consideRatio marked this conversation as resolved.
Show resolved Hide resolved

if urlPath:
path = urlPath
Expand All @@ -171,10 +199,19 @@ def get(self):
else:
path = 'tree/' + path

if provider is not None:
path = "tree/"

self.write(
self.render_template(
'status.html',
repo=repo, branch=branch, path=path, depth=depth, targetpath=targetpath, version=__version__
repo=repo,
branch=branch,
path=path,
depth=depth,
provider=provider,
targetpath=targetpath,
version=__version__
))
self.flush()

Expand Down Expand Up @@ -209,3 +246,10 @@ def get(self):
)

self.redirect(new_url)


class ThreadWithResult(threading.Thread):
def __init__(self, group=None, target=None, name=None, args=(), kwargs={}, *, daemon=None):
def function():
self.result = target(*args, **kwargs)
super().__init__(group=group, target=function, name=name, daemon=daemon)
22 changes: 22 additions & 0 deletions nbgitpuller/hookspecs.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,22 @@
import pluggy
sean-morris marked this conversation as resolved.
Show resolved Hide resolved

hookspec = pluggy.HookspecMarker("nbgitpuller")
hookimpl = pluggy.HookimplMarker("nbgitpuller")
sean-morris marked this conversation as resolved.
Show resolved Hide resolved


@hookspec(firstresult=True)
def handle_files(query_line_args):
"""
:param json query_line_args: this includes any argument you put on the url
:return two parameter json unzip_dir and origin_repo_path
:rtype json object

The developer uses this function to download, un-compress and save the
source files to the TEMP_DOWNLOAD_REPO_DIR folder.

The parameter, query_line_args, is any argument you put on the URL

Once the files are saved to the directly, git puller can handle all the
sean-morris marked this conversation as resolved.
Show resolved Hide resolved
standard functions needed to make sure source files are updated or created
as needed.
"""
Loading