Skip to content

python implementation of ETL synthea csv converstion that is massively scalable

License

Notifications You must be signed in to change notification settings

scivm/ETL-Synthea-Python

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

36 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

ETL-Synthea-Python

Release date: Feb 16, 2020

This project contains the source code to convert Synthea csv data to csv files suitable for loading into an OMOP Common Data Model v5.3.1 and v6 database.

Synthea is able to generate an unlimited amount of patient records for multiple countries.

This tool is capable of converting synthea csv to OMOP CDM v5 and v6.

What's in Here?

python_etl

A complete Python-based ETL of the Synthea data into CDMv5 and CDMv6 compatible CSV files. See the README.md file therein for detailed instructions for running the ETL, as well as creating and loading the data into a CDMv5 database.

scripts

The scripts folder holds handy scripts for downloading and munging some of the raw data used in the ETL process. Instructions for their use can be found in the python_etl/README.md file.

hand_conversion

hand-converted a couple patients worth of SynPUF data into CDMv5.

Additional Resources

  • The OHDSI Medicare ETL SynPUF.pdf provides a light overview of the differences between SynPUF and other Medicare datasets, such as SEER Medicare and Medicare LDS. This presentation was presented to the OHDSI CMS ETL workgroup on February 2015 by Jennifer Duryea at Outcomes Insights.

#History of contributions

Based on CMS-ETL written by:

  • Ryan Duryea @aguynamedryan, Outcomes Insights, Inc.
  • Erica Voss @ericaVoss, Janssen Research and Development.
  • Jennifer Duryea @jenniferduryea, Outcomes Insights, Inc.
  • Don O'Hara @donohara, Evidera.
  • Claire Cangialose @claire-oi, Outcomes Insights, Inc.
  • Patrick Ryan @Patrick_Ryan, Janssen Research and Development.
  • Christophe Lambert @Christophe_Lambert, University of New Mexico, Center for Global Health, Division of Translational Informatics, Department of Internal Medicine
  • Praveen Kumar @Praveen_Kumar, University of New Mexico, Department of Computer Science
  • Amritansh @Amritansh, University of New Mexico, Department of Computer Science

About

python implementation of ETL synthea csv converstion that is massively scalable

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published