Code for the scooby manuscript. Scooby is the first model to predict scRNA-seq coverage and scATAC-seq insertion profiles along the genome at single-cell resolution. For this, it leverages the pre-trained multi-omics profile predictor Borzoi as a foundation model, equips it with a cell-specific decoder, and fine-tunes its sequence embeddings. Specifically, the decoder is conditioned on the cell position in a precomputed single-cell embedding.
This repository contains model and data loading code and a train script. The reproducibility repository contains notebooks to reproduce the results of the manuscript.
- NVIDIA GPU (tested on A40), Linux, Python (tested with v3.9)
scooby uses a a custom version of SnapATAC2, which we built using rust:
- Install rust with
curl --proto '=https' --tlsv1.2 -sSf https://sh.rustup.rs | sh
pip install "git+https://github.com/lauradmartens/SnapATAC2.git#egg=snapatac2&subdirectory=snapatac2-python"
pip install git+https://github.com/gagneurlab/scooby.git
- Download file contents from the Zenodo repo
- Use examples from the scooby reproducibility repository
We offer a train script for modeling scRNA-seq only and a script for multiome modeling. Both require SNAPATAC2-preprocessed anndatas and embeddings. Training scooby takes 1-2 days on 8 NVIDIA A40 GPUs with 128GB RAM and 32 cores.
Currently, the model is only tested with a batch size of 1.