Skip to content

Latest commit

 

History

History
43 lines (37 loc) · 2.22 KB

README.md

File metadata and controls

43 lines (37 loc) · 2.22 KB

DICSIT

This repository contains code related to the detection of important cell subsets in single-cell transcriptomics (DICSIT) workflow. R markdown files for data preprocessing are contained in the r_markdown folder, together with output markdown documents.

The original CellCnn code on which DICSIT is based is included in the python folder. In addition, there are two python scripts for running a DICSIT/CellCnn analysis based on preprocessed data, main.py and generate_param_grid.py. The former can be used to run a single analysis with the following command, specifying a name of the analysis. For more detailed information about the arguments and options, call python3 main.py -h.

python3 main.py --ncell NCELL
                --nsubset NSUBSET
                --nfeatures NFEATURES
                --train_data TRAIN_DATA
                --test_data TEST_DATA
                --output_path OUTPUT_PATH
                --response_data RESPONSE_DATA
                --response RESPONSE
                --sample_col SAMPLE_COL
                --name NAME

The latter can be used to run analyses using different numbers of features (by default, a range from 10 to 100 in steps of 10), for different splits of the data. By default, it assumes that the training and test data CSV files for the different splits are named train_data_split_{i}.csv, where i is between 1 and n_splits (by default, 3). This script repeatedly runs main.py. By default, the parameters ncell and nsubset are set to 500 and 1000, respectively. Here, several arguments can be provided to test different settings. For more detailed information about the arguments and options, call python3 generate_param_grid.py -h.

python3 generate_param_grid.py --name NAME
                               --data_path DATA_PATH
                               --response_data RESPONSE_DATA
                               --output_path OUTPUT_PATH

Jupyter notebooks for downstream analysis of selected cell subsets are included in notebooks. Additionally, there is code in r_markdown for evaluating differentially expressed genes between selected and unselected cells.