GitHub - genecell/COSG: Accurate and fast cell marker gene identification with COSG

Accurate and fast cell marker gene identification with COSG

Overview

COSG is a cosine similarity-based method for more accurate and scalable marker gene identification.

COSG is a general method for cell marker gene identification across different data modalities, e.g., scRNA-seq, scATAC-seq and spatially resolved transcriptome data.
Marker genes or genomic regions identified by COSG are more indicative and with greater cell-type specificity.
COSG is ultrafast for large-scale datasets, and is capable of identifying marker genes for one million cells in less than two minutes.

The method and benchmarking results are described in Dai et al., (2022).

Documentation

The documentation for COSG is available here.

Tutorial

The COSG tutorial provides a quick-start guide for using COSG and demonstrates the superior performance of COSG as compared with other methods, and the Jupyter notebook is also available.

Question

For questions about the code and tutorial, please contact Min Dai, [email protected].

Example

Run COSG:

import cosg
n_gene=30
groupby='CellTypes'
cosg.cosg(adata,
    key_added='cosg',
    # use_raw=False, layer='log1p', ## e.g., if you want to use the log1p layer in adata
    mu=100,
    expressed_pct=0.1,
    remove_lowly_expressed=True,
     n_genes_user=100,
               groupby=groupby)

Draw the dot plot:

sc.tl.dendrogram(adata,groupby=groupby,use_rep='X_pca')
df_tmp=pd.DataFrame(adata.uns['cosg']['names'][:3,]).T
df_tmp=df_tmp.reindex(adata.uns['dendrogram_'+groupby]['categories_ordered'])
marker_genes_list={idx: list(row.values) for idx, row in df_tmp.iterrows()}
marker_genes_list = {k: v for k, v in marker_genes_list.items() if not any(isinstance(x, float) for x in v)}

sc.pl.dotplot(adata, marker_genes_list,
             groupby=groupby,
             dendrogram=True,
              swap_axes=False,
             standard_scale='var',
             cmap='Spectral_r')

Output the marker list as pandas dataframe:

marker_gene=pd.DataFrame(adata.uns['cosg']['names'])
marker_gene.head()

You could also check the COSG scores:

marker_gene_scores=pd.DataFrame(adata.uns['cosg']['scores'])
marker_gene_scores.head()

Citation

If COSG is useful for your research, please consider citing Dai et al., (2022).

Name		Name	Last commit message	Last commit date
Latest commit History 21 Commits
build/lib/cosg		build/lib/cosg
cosg.egg-info		cosg.egg-info
cosg		cosg
dist		dist
docs		docs
tests		tests
tutorials		tutorials
.DS_Store		.DS_Store
LICENSE		LICENSE
README.rst		README.rst
requirements.txt		requirements.txt
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Accurate and fast cell marker gene identification with COSG

Overview

Documentation

Tutorial

Question

Example

Citation

About

Releases 1

Packages

Languages

License

genecell/COSG

Folders and files

Latest commit

History

Repository files navigation

Accurate and fast cell marker gene identification with COSG

Overview

Documentation

Tutorial

Question

Example

Citation

About

Topics

Resources

License

Stars

Watchers

Forks

Releases 1

Packages 0

Languages

Packages