Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[MRG] Add a repo provider for CKAN datasets #1833

Merged
merged 7 commits into from
Jul 6, 2024

Conversation

u10313335
Copy link
Contributor

This PR adds a repository provider for datasets on repositories built upon CKAN, which is an open-source DMS (data management system) for powering data hubs and data portals.

This PR requires a content provider for repo2docker: jupyterhub/repo2docker#1336.

We have adopted this PR in our Binder service (as the CKAN dataset repository): https://binder.depositar.io/, e.g.:

Copy link

welcome bot commented Mar 7, 2024

Thanks for submitting your first pull request! You are awesome! 🤗

If you haven't done so already, check out Jupyter's Code of Conduct.
welcome
You can meet the other Jovyans by joining our Discourse forum. There is also a intro thread there where you can stop by and say Hi! 👋

Welcome to the Jupyter community! 🎉

@u10313335 u10313335 changed the title [WIP] Add a repo provider for CKAN datasets [MRG] Add a repo provider for CKAN datasets Mar 7, 2024
Copy link
Collaborator

@yuvipanda yuvipanda left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Love to see the binderhub set up! Yay! Apologies for the late response.

This is a companion review to jupyterhub/repo2docker#1336 (review). And like that one, there're a couple minor stylistic issues.

But the fundamental question is one of versions and 'ref'. Should 'version' be a concept here directly specifyable, similar to 'ref' in git? I think you have enough knowledge of CKAN to help answer that.

Regardless, excited to find time to get this PR merged :)

client = AsyncHTTPClient()

api = parsed_repo._replace(
path=re.sub(self.url_regex, "/api/3/action/", parsed_repo.path)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same comment about using regexes as in jupyterhub/repo2docker#1336 (comment).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks. I have made changes according to your suggestion.

except HTTPError:
return None

def parse_date(json_body):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Given that this function is used only once, let's just inline that here. Otherwise the function gets redefined each time the parent function is called. Also adds another layer of indirection that's otherwise not needed.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure. I will fix this as soon as possible.

@u10313335
Copy link
Contributor Author

Thank you again for the review!

But the fundamental question is one of versions and 'ref'. Should 'version' be a concept here directly specifyable, similar to 'ref' in git? I think you have enough knowledge of CKAN to help answer that.

Please also find my comments at jupyterhub/repo2docker#1336 (comment).

Copy link
Collaborator

@yuvipanda yuvipanda left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Apologies for the delay! I've left a review again here and on repo2docker.

"https://demo.ckan.org/dataset/sample-dataset-1",
"https://demo.ckan.org/dataset/sample-dataset-1",
"sample-dataset-1.v",
"https://demo.ckan.org/dataset/sample-dataset-1",
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you add some additional tests here for /history/ as well as ?activity_id= URLs?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, added.

async def get_resolved_ref(self):
parsed_repo = urlparse(self.repo)

url_parts_1 = parsed_repo.path.split("/history/")
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Update this to use the parsing changes in jupyterhub/repo2docker#1336 (comment)?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure and updated.

@u10313335
Copy link
Contributor Author

Apologies for the delay! I've left a review again here and on repo2docker.

@yuvipanda Thank you for the review. I have made some modifications according to your suggestions. Please have a look.

@yuvipanda
Copy link
Collaborator

yuvipanda commented Jun 29, 2024

Steps to merge this:

  1. [MRG] Add CKAN content provider repo2docker#1336 should be merged (done)
  2. Add changelog for 2024.07.0 repo2docker#1356 should be merged (scheduled for monday)
  3. Make new release of repo2docker (should be done as part of merging (2))
  4. Bump the image used by binderhub to point to new version Bump version of repo2docker being used #1863
  5. then merge this PR.

No action currently needed from you, @u10313335!

@yuvipanda yuvipanda merged commit 497c2ac into jupyterhub:main Jul 6, 2024
15 checks passed
@yuvipanda
Copy link
Collaborator

Tested this locally (on https://demo.ckan.org/dataset/sample-dataset-1) and it works! Thank you very much for your patience in getting this through, @u10313335! Look forward to more contributions from you :)

@yuvipanda
Copy link
Collaborator

I'll deploy this on mybinder.org shortly

consideRatio pushed a commit to jupyterhub/helm-chart that referenced this pull request Jul 6, 2024
@yuvipanda
Copy link
Collaborator

Deployment to mybinder.org now complete, @u10313335! Thank you for your contribution :)

@u10313335
Copy link
Contributor Author

Thank you @yuvipanda!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants