This repository contains several scripts for working with HEAL CDEs. Primarily, it converts the Excel representation of these HEAL CDEs into a JSON representation based on the data model used by the NIH CDE Repository (see JSON Schemas for Data Elements and Forms), and then converting these JSON files into other formats for use in downstream tools.
We use venv to maintain the list of packages.
$ python -m venv venv
$ source venv/bin/activate
$ pip install -r requirements.txt
The script generators/excel2cde.py recursively converts Excel files in the
expected format into JSON files in the output directory (by default, to the
output/json
directory).
$ python generators/excel2cde.py [input-directory] [--output output_directory]
Excel template generation can be configured with the input/cde-template-locations.yaml
file. Note particularly the template
variable, which should be set to the location
of the XLSX template (input/cde-template.xlsx
by default). You should then run:
$ python exporters/xlsx-exporter.py -c input/cde-template-locations.yaml -o output/xlsx output/json
Annotation generally requires sending the HEAL CDE text content to an
online annotation process, following by using the Translator Node Normalization
service to filter and standardize the resulting annotations. This reliance
on online services causes several possible points of failure. To mitigate
this, the annotation workflow is intended to be run through a Rakefile.
The Rakefile in this repo contains instructions for building the annotated
KGX output into the annotated/
directory.
$ rake
$ python validators/check_annotated.py annotated
$ mv annotated annotated/year-month-day