Skip to content

Code for working with the HEAL Common Data Elements (CDEs)

License

Notifications You must be signed in to change notification settings

heal-data-stewards/heal-cdes

Repository files navigation

Scripts for managing HEAL CDEs

This repository contains several scripts for working with HEAL CDEs. Primarily, it converts the Excel representation of these HEAL CDEs into a JSON representation based on the data model used by the NIH CDE Repository (see JSON Schemas for Data Elements and Forms), and then converting these JSON files into other formats for use in downstream tools.

How to use

Getting started

We use venv to maintain the list of packages.

$ python -m venv venv
$ source venv/bin/activate
$ pip install -r requirements.txt

Generating JSON files

The script generators/excel2cde.py recursively converts Excel files in the expected format into JSON files in the output directory (by default, to the output/json directory).

$ python generators/excel2cde.py [input-directory] [--output output_directory]

Converting JSON files to Excel templates

Excel template generation can be configured with the input/cde-template-locations.yaml file. Note particularly the template variable, which should be set to the location of the XLSX template (input/cde-template.xlsx by default). You should then run:

$ python exporters/xlsx-exporter.py -c input/cde-template-locations.yaml -o output/xlsx output/json

Annotating JSON files

Annotation generally requires sending the HEAL CDE text content to an online annotation process, following by using the Translator Node Normalization service to filter and standardize the resulting annotations. This reliance on online services causes several possible points of failure. To mitigate this, the annotation workflow is intended to be run through a Rakefile. The Rakefile in this repo contains instructions for building the annotated KGX output into the annotated/ directory.

$ rake
$ python validators/check_annotated.py annotated
$ mv annotated annotated/year-month-day

About

Code for working with the HEAL Common Data Elements (CDEs)

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published