Using Clickhouse OLAP to support Study View cohort queries (pilot)

Description

This repo will provision and run a Clickhouse instance with data from msk_met_2012, msk_ch_2020 and msk_imapct_2017 datahub studies. This Clickhouse instance can be used by a modified cBioPortal backend to run cohort/filter queries in Study View.

Connection with cBioPortal MySQL database

Clickhouse performs well for analytical queries (search on column values) but is less suitable to retrieve all column values on an entity (typically SELECT * FROM ...). In the current implementation the samples table contains a column with internal sample identifiers used in the cBioPortal MySQL database. This allows for efficient retrieval of sample objects (created with SELECT * FROM sample ... in the MySQL database) once Clickhouse has determined the correct sample identifiers in the cohort.

The clickhouse schema is defined in clickhouse_provisioning/ directory

Installation

Edit the study_configs section in create_clickhouse_db_table_files.py file to reflect paths to msk_met_2012, msk_ch_2020 and msk_imapct_2017 datahub studies

study_configs = [
    {
        "study_dir": "/home/pnp300/git/datahub/public/msk_met_2021",
        "name": "msk_met_2021"
    },
    {
        "study_dir": "/home/pnp300/git/datahub/public/msk_ch_2020",
        "name": "msk_ch_2020"
    },
    {
        "study_dir": "/home/pnp300/git/datahub/public/msk_impact_2017",
        "name": "msk_impact_2017"
    }
]

Create Clickhouse staging files in the clickhouse_provisioning directory (in this repo) by running the create_clickhouse_db_table_files.py script:

python3 create_clickhouse_db_table_files.py

Provision and run Clickhouse by running the docker-compose.yml file:

docker-compose up

or for detached mode:

docker-compose up -d

This will start a Clickhouse instance with port 8123 exposed on the host system.

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
clickhouse_provisioning		clickhouse_provisioning
.gitignore		.gitignore
README.md		README.md
README_FOR_HACKATHON.md		README_FOR_HACKATHON.md
create_clickhouse_db_table_files.py		create_clickhouse_db_table_files.py
docker-compose.yml		docker-compose.yml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Using Clickhouse OLAP to support Study View cohort queries (pilot)

Description

Connection with cBioPortal MySQL database

Installation

About

Releases

Packages

Languages

cBioPortal/cbioportal-clickhouse-pilot

Folders and files

Latest commit

History

Repository files navigation

Using Clickhouse OLAP to support Study View cohort queries (pilot)

Description

Connection with cBioPortal MySQL database

Installation

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages