Skip to content

gagneurlab/scooby

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

29 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Scooby

image

Documentation Status

Code for the scooby manuscript. Scooby is the first model to predict scRNA-seq coverage and scATAC-seq insertion profiles along the genome at single-cell resolution. For this, it leverages the pre-trained multi-omics profile predictor Borzoi as a foundation model, equips it with a cell-specific decoder, and fine-tunes its sequence embeddings. Specifically, the decoder is conditioned on the cell position in a precomputed single-cell embedding.

This repository contains model and data loading code and a train script. The reproducibility repository contains notebooks to reproduce the results of the manuscript.

Hardware requirements

  • NVIDIA GPU (tested on A40), Linux, Python (tested with v3.9)

Installation instructions

Prerequisites

scooby uses a a custom version of SnapATAC2, which we built using rust:

  • Install rust with curl --proto '=https' --tlsv1.2 -sSf https://sh.rustup.rs | sh
  • pip install "git+https://github.com/lauradmartens/SnapATAC2.git#egg=snapatac2&subdirectory=snapatac2-python"

Scooby package installation

  • pip install git+https://github.com/gagneurlab/scooby.git
  • Download file contents from the Zenodo repo
  • Use examples from the scooby reproducibility repository

Training

We offer a train script for modeling scRNA-seq only and a script for multiome modeling. Both require SNAPATAC2-preprocessed anndatas and embeddings. Training scooby takes 1-2 days on 8 NVIDIA A40 GPUs with 128GB RAM and 32 cores.

Model architecture

Currently, the model is only tested with a batch size of 1.

image