Skip to content

Repository for quality evaluation and benchmarking synthetic datasets with metrics through a friendly GUI

Notifications You must be signed in to change notification settings

bmi-labmedinfo/SynthRO

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

10 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

SynthRO: a dashboard to evaluate and benchmark synthetic data

Table of Contents
  1. About The Project
  2. Installation
  3. Extensibility
  4. License

About The Project

The rapid increase in patient data collection by healthcare providers, governments, and private industries is generating vast and varied datasets that provide new insights into critical medical questions. Despite the rise of medical devices powered by Artificial Intelligence, research access to data remains restricted due to privacy concerns. One possible solution is to use Synthetic Data, which replicates the main statistical properties of real patient data. However, the lack of standardized evaluation metrics makes selecting appropriate synthetic data methods challenging. Effective evaluation must balance resemblance, utility, and privacy, but current benchmarking efforts are limited, necessitating further research.

To address this constraint, we've introduced SynthRO (Synthetic data Rank and Order), a user-friendly tool designed to benchmark synthetic health tabular data across various contexts. SynthRO provides accessible quality evaluation metrics and automated benchmarking, enabling users to identify the most suitable synthetic data models for specific applications by prioritizing metrics and delivering consistent quantitative scores.

↰ Back To Top

Installation

This repository provides a Conda environment configuration file (synthro_env.yml) to streamline the setup process. Follow these steps to create the environment:

Important

Make sure you have Conda installed. If not, install Conda before proceeding.

Steps to Create the Environment

  1. Create the Conda Environment

    Run the following command to create the environment using the provided .yml file:

    conda env create -f synthro_env.yml

    This command will set up a Conda environment named according to specifications in the synthro_env.yml file.

  2. Activate the Environment

    Once the environment is created, activate it using:

    conda activate synthro_env

Running the Code

Once the virtual environment is activated, you can run the code using the following steps:

python SynthRO_app.py

Additional Notes

To deactivate the environment, simply use:

conda deactivate

↰ Back To Top

Tip

If you want to try the tool, here you will find an example of an original and synthetic dataset.

Extensibility

The tool has a modular structure, allowing new sections and evaluation metrics to be added at any time.

Methodology

Regarding the methodological part, the code should be integrated into one of the classes already implemented in the utils.py script. For instance, if you want to add a new type of simulated attack among the privacy metrics, it should be added as a static method of the Privacy class:

class Privacy:

    # Other implemented methods

    @staticmethod
    def new_simulated_attack():
        # Code for the new method
        pass

Afterwards, the new method must be invoked within the main script.

Graphical Interface

The graphical interface was developed using the Dash package in Python. Once the new metric is defined, it can be integrated into the existing graphical elements or a new section can be created using the graphical elements provided by the package.

The SynthRO_app.py script is divided into well-defined sections, making it easy for the user to locate new graphical elements.

↰ Back To Top

License

SynthRO © 2024 by Gabriele Santangelo is licensed under CC BY-NC-SA 4.0, click for more information.

↰ Back To Top

About

Repository for quality evaluation and benchmarking synthetic datasets with metrics through a friendly GUI

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages