Scripts for managing HEAL CDEs

This repository contains several scripts for working with HEAL CDEs. Primarily, it converts the Excel representation of these HEAL CDEs into a JSON representation based on the data model used by the NIH CDE Repository (see JSON Schemas for Data Elements and Forms), and then converting these JSON files into other formats for use in downstream tools.

How to use

Getting started

We use venv to maintain the list of packages.

$ python -m venv venv
$ source venv/bin/activate
$ pip install -r requirements.txt

Generating JSON files

The script generators/excel2cde.py recursively converts Excel files in the expected format into JSON files in the output directory (by default, to the output/json directory).

$ python generators/excel2cde.py [input-directory] [--output output_directory]

Converting JSON files to Excel templates

Excel template generation can be configured with the input/cde-template-locations.yaml file. Note particularly the template variable, which should be set to the location of the XLSX template (input/cde-template.xlsx by default). You should then run:

$ python exporters/xlsx-exporter.py -c input/cde-template-locations.yaml -o output/xlsx output/json

Annotating JSON files

Annotation generally requires sending the HEAL CDE text content to an online annotation process, following by using the Translator Node Normalization service to filter and standardize the resulting annotations. This reliance on online services causes several possible points of failure. To mitigate this, the annotation workflow is intended to be run through a Rakefile. The Rakefile in this repo contains instructions for building the annotated KGX output into the annotated/ directory.

$ rake
$ python validators/check_annotated.py annotated
$ mv annotated annotated/year-month-day

Name		Name	Last commit message	Last commit date
Latest commit History 129 Commits
downloaders		downloaders
exporters		exporters
input		input
schemas		schemas
validators		validators
.env.default		.env.default
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
Rakefile		Rakefile
cde2csv.py		cde2csv.py
cde2ids.py		cde2ids.py
requirements.txt		requirements.txt
validate-forms.py		validate-forms.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Scripts for managing HEAL CDEs

How to use

Getting started

Generating JSON files

Converting JSON files to Excel templates

Annotating JSON files

About

Releases

Packages

Languages

License

heal-data-stewards/heal-cdes

Folders and files

Latest commit

History

Repository files navigation

Scripts for managing HEAL CDEs

How to use

Getting started

Generating JSON files

Converting JSON files to Excel templates

Annotating JSON files

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages