Skip to content

A django app for describing datasets and the governance required for their access. Also manages data use agreements associated with the data.

License

Notifications You must be signed in to change notification settings

Samuel-J-Wood-Library/datacatalog

Repository files navigation

Data Catalog

A django app for cataloging institutional research datasets

Peter Oxley
Weill Cornell Medicine
Samuel J. Wood Library and C.V. Starr Biomedical Information Center
1300 York Ave
New York, NY 10065
[email protected]

Scope

The Data Catalog is an internal catalog of research datasets available to members of an institution. The catalog does not contain the data, but rather contains descriptions and metadata of each dataset. The catalog allows researchers to make their own datasets discoverable to others at the institution, and allows researchers to identify biomedical or health data that are not readily accessible elsewhere. The catalog provides the ability to search for datasets based on their title, description, data elements, keywords, or data governance. Researchers can also submit new datasets for display in the catalog. After new submissions have been reviewed by the curatorial team, they are then made available for discovery in the catalog.

Structure of the catalog

The catalog consists of 5 primary tables of information:

  1. Datasets - metadata regarding a specific data repository/file/database
  2. Data Providers - either the data source for a dataset, or a publisher of the data
  3. Data Use Agreements - specific terms of use for a specific set of users, for a delimited period of time
  4. Data Access Conditions - requirements for hosting the dataset (as specified by the publisher). Includes whether data must be secured, can be mixed with other data, and must be destroyed at end of project.
  5. Keywords - a customizable dictionary of terms that can be used to tag related datasets.

Setup

This app will need to be installed into an existing Django project.

  1. Make sure you have first installed the dependencies listed below.
  2. Download latest code from https://github.com/oxpeter/datacatalog/archive/master.zip
  3. Copy datacatalog directory into your Django project directory
  4. Add datacatalog.apps.DatacatalogConfig to INSTALLED_APPS in settings.py
  5. From the project directory, run python manage.py migrate datacatalog
  6. Make sure you have a base.html page defined, and that it includes navbar and content blocks (if this is running as a standalone app, create a base.html file in the root templates directory)
  7. Go to the Django admin page
  8. Under AUTHENTICATION AND AUTHORIZATION select Add Group
  9. Create the following groups:
    1. datacatalog_editor - give view/add/delete/change permissions to datacatalog tables
    2. dua_viewing_privileges - give view permissions to datacatalog tables

app permissions

Viewing data use agreements

Data Use Agreements are by default hidden from your users, unless they are added to the dua_viewing_privileges group.

Editing items

Regular users have the ability to submit new entries for all catalog types. These entries will not be marked curated or published, and thus will not be immediately visible in the catalog. Instead, users in the datacatalog_editor group have the ability to view and modify the entries, and inside the admin site, can change the items to curated (to indicate they have been quality checked), and also set to published (to make them visible on the website). To assist with large-scale curation or publication, you can select multiple entries in the admin table and use the actions box to mark all selected items as published/unpublished/curated.

Dependencies

This app was developed and tested with Django 2.1. While it should work on all versions ≥2.0, we cannot guarantee performance on other versions.

You will also require the following Django apps:

You will also require the following python packages:

About

A django app for describing datasets and the governance required for their access. Also manages data use agreements associated with the data.

Resources

License

Stars

Watchers

Forks

Packages

No packages published

Contributors 4

  •  
  •  
  •  
  •