Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Hack in fragile GitLab support #879

Open
wants to merge 8 commits into
base: main
Choose a base branch
from

Conversation

cbowdon
Copy link

@cbowdon cbowdon commented Jan 24, 2020

Hi! This isn't merge-worthy I'm afraid, but I'm submitting the PR in case it helps someone else who wants to do the same thing. The changes add support for viewing notebooks on private GitLab instances (but sadly only notebooks, not trees or anything else).

With this PR, any domains that start with gitlab will use a handler that assumes an environment variable GITLAB_TOKEN which is your private access token for a v4 GitLab API.

Any domains that start with `gitlab` will use a handler that assumes
an environment variable GITLAB_TOKEN which is your private access
token for a v4 GitLab API.
@krinsman
Copy link
Collaborator

This definitely looks like a really good start to me though! It actually looks like it hits all of the right places, as far as I can tell. In fact I'm not actually convinced it shouldn't be merged. I guess we can discuss that more if you want (keep in mind that I only have triage access not write access).

I guess in order to view trees and not just notebooks it would be necessary or at least useful to write a gitlab client analogous to the github client for interfacing more thoroughly with the GitLab API.
e.g. see here: https://github.com/jupyter/nbviewer/blob/master/nbviewer/providers/github/client.py

I've always suspected it might actually be fairly easy to copy and modify the code for the GitHub client and handlers to work for GitLab if one had knowledge of the GitLab API. I've never tried it myself though because I don't have any experience and don't know how to get acquainted with it.

Anyway though GitLab integration with NBViewer would actually be a really useful feature e.g. at NERSC where a lot of repos are hosted on GitLab. @rcthomas Does NERSC have private Gitlab tokens that they might want to use with this?

Yeah actually I'm not sure why this shouldn't be merged -- it doesn't seem like it affects any default behavior, except for URLs beginning with gitlab.com, and it seems to affect the default behavior in those cases in a way that would be desirable.

@krinsman krinsman added tag:GitLab Related to GitLab's rendering of notebooks tag:Provider Related to a notebook provider tag:Sprint Friendly A good issue to tackle during a sprint, hackathon, etc. type:Enhancement A proposed extension to the behavior of the project labels Jan 24, 2020
@krinsman krinsman self-requested a review January 24, 2020 18:34
Copy link
Collaborator

@krinsman krinsman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I recall @parente being somewhat uncomfortable about adding the provider handlers as traitlets configurables, but inasmuch as it was my bad idea I approve of it being extended here to include a new GitLab provider. This also uses default_rewrites and uri_rewrites much more effectively/intelligently than I ever did. The structure of the GitLab provider also looks really good, all of the code is in the right place in the right folders for example, the GET methods seem to demonstrate really good understanding of the GitLab URL scheme, there's good exception handling, and the changes are non-intrusive anyway since they only affect viewing GitLab notebooks, the default_handlers is all used correctly. And if it was merged we could let it be tested in the wild for a while before updating the public NBViewer service to incorporate it, although I wouldn't anticipate there being any issues if it was incorporated. Admittedly my judgment might be clouded by the fact that I'm flattered that the changes I made were understandable enough that someone else was able to copy, modify, and extend them, but regardless this actually looks really good to me. I certainly approve.

@krinsman krinsman requested a review from parente January 24, 2020 18:41
We'll be using `path_type` to identify whether to render the notebook
directly or render a list view.
- Lookup blobs directly where possible
- Fall back to searching project trees
- Add logs for HTTP Errors
- Remove info log messages with private tokens in URLs
@krinsman
Copy link
Collaborator

@cbowdon Maybe this would also be even more likely to be merged if you copy-pasted some of the tests from /providers/github/tests/test_github.py (I think this would be the appropriate test file in /providers/github/tests/ to copy from) and modified them to be appropriate for gitlab (and put them in a /providers/gitlab/tests/ directory.

I'm not sure which specific tests should be copied and modified at the moment, although if you want, and I can find the free time, I could possibly look into it for you.

@cbowdon
Copy link
Author

cbowdon commented Jan 29, 2020

@krinsman Sure, I can do that. I've actually managed to find a little time to do the client refactoring you suggested and start to implement tree support. Progress will be very slow I'm afraid, but I'm blown away by the more welcoming and enthusiastic response than I expected 😄

@krinsman
Copy link
Collaborator

krinsman commented Feb 3, 2020

@cbowdon Honestly I'm kind of blown away that you didn't expect an enthusiastic response, haha! This is a really good PR.

Anyway that sounds great! GitLab support is a feature I have really wanted to see added to NBViewer, and I am glad that someone is working on it!

Adds support for rendering the directory view with breadcrumbs.

Last modified time is not included yet.
@krinsman
Copy link
Collaborator

krinsman commented Feb 4, 2020

@cbowdon Oh wow these extra new commits look really good!

These are really impressive actually. When you said that "progress will be very slow", I thought you meant months, not days!

Please feel free to let me know when you are finished and want me to review the final product again.

Also prevents the fallback lookup method failing because we hit the
number of project search results.
@krinsman
Copy link
Collaborator

@cbowdon I like these new changes!

Let me know when you want me to look at the code for this PR again, i.e. when you think it might be ready for merging.

@cbowdon
Copy link
Author

cbowdon commented Feb 28, 2020

@krinsman Thanks! I'm dogfooding this PR internally and keep discovering little problems, not all fixed yet. Will let you know as soon as it's ready for review.

@krinsman
Copy link
Collaborator

Haha, very nice, I just learned that term ("dogfooding") today from you. Looking forward to hear back about any updates!

@krinsman krinsman mentioned this pull request Mar 8, 2020
@cbowdon
Copy link
Author

cbowdon commented Apr 29, 2020

@krinsman sorry for taking so long on this. Also sorry, I don't understand why that check has failed - would appreciate a hand on that.

I'm worried that if I leave this too long, merge conflicts will start to build up. Would you be open to merging it as-is? The outstanding problems are:

  • GitLab URLs with a - path segment in them don't work
  • It's not possible to browse the root of a GitLab repository

The entire PR should only affect URLs starting with gitlab.

@robindebois
Copy link
Contributor

robindebois commented Oct 3, 2020

@cbowdon Hi! The PR looks really interesting... just 2 questions:

  • how come its not merged yet, seems it only failed on some code formatting?
  • is support for subgroups missing or do I misunderstand how to use it?

Clarifying the latter

Gitlab docs mention that when doing calls, e.g.

GET /projects/:id/repository/tree

the id here is (https://docs.gitlab.com/ee/api/repositories.html):

id (required) - The ID or URL-encoded path of the project owned by the authenticated user

In other words one can encode the following url: https://gitlab.cs.washington.edu/prescience/public-notebooks
in this case as either doing a call to: /projects/18288/repository/tree or /projects/prescience%2Fpublic-notebooks/repository/tree.

The code seems to create project_id from a regex with group and repo pattern, which works fine for simply structured repos but gitlab also supports subgroups. For example, look at this one: https://gitlab.com/power-progress-community/oshw-powerpc-notebook/software/mame. The correct API call here would be to:

api/v4/projects/power-progress-community%2Foshw-powerpc-notebook%2Fsoftware%2Fmame/repository/tree"

I can try a fix but I'd like to first make sure I'm not wrong... (happen to have a PhD in being wrong 😛).

@Nathan-Furnal
Copy link

Hi! Any update on the gitlab support and this issue? I've wanted to add a viewer in my README for a project and that would be just the thing I need =)

Thanks for the help

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
tag:GitLab Related to GitLab's rendering of notebooks tag:Provider Related to a notebook provider tag:Sprint Friendly A good issue to tackle during a sprint, hackathon, etc. type:Enhancement A proposed extension to the behavior of the project
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants