-
Notifications
You must be signed in to change notification settings - Fork 28
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
No cistromes found #30
Comments
Hi @saketkc Exciting to see people using it on pig data. From your screenshot it looks like your motifs are not annotated to TFs. (i.e. Direct_annot and Orthology_annot are NaN). Did you create a custom motif-to-TF annotation for pig? i.e. the following parameter Best Seppe |
Thanks @SeppeDeWinter! I did create a custom motif-to-TF annotation (gist here):
I also modified the associated function to read this file directly, keeping everything else the same: def load_motif_annotations(specie: str,
version: str = 'v9',
fname: str = None,
column_names=('#motif_id', 'gene_name',
'motif_similarity_qvalue', 'orthologous_identity', 'description'),
motif_similarity_fdr: float = 0.001,
orthologous_identity_threshold: float = 0.0):
"""
Load motif annotations from a motif2TF snapshot.
Parameters
---------
specie:
Specie to retrieve annotations for.
version:
Motif collection version.
fname:
The snapshot taken from motif2TF.
column_names:
The names of the columns in the snapshot to load.
motif_similarity_fdr:
The maximum False Discovery Rate to find factor annotations for enriched motifs.
orthologuous_identity_threshold:
The minimum orthologuous identity to find factor annotations for enriched motifs.
Return
---------
A dataframe with motif annotations for each motif
"""
# Create a MultiIndex for the index combining unique gene name and motif ID. This should facilitate
# later merging.
fname = "/home/choudharys/github/JASPAR2022_CORE_vertebrates_non-redundant_pfms_motif_tfs.tbl"
df = pd.read_csv(fname, sep='\t', usecols=column_names)
df.rename(columns={'#motif_id':"MotifID",
'gene_name':"TF",
'motif_similarity_qvalue': "MotifSimilarityQvalue",
'orthologous_identity': "OrthologousIdentity",
'description': "Annotation" }, inplace=True)
df = df[(df["MotifSimilarityQvalue"] <= motif_similarity_fdr) &
(df["OrthologousIdentity"] >= orthologous_identity_threshold)]
# Direct annotation
df_direct_annot = df[df['Annotation'] == 'gene is directly annotated']
df_direct_annot = df_direct_annot.groupby(['MotifID'])['TF'].apply(lambda x: ', '.join(list(set(x)))).reset_index()
df_direct_annot.index = df_direct_annot['MotifID']
df_direct_annot = pd.DataFrame(df_direct_annot['TF'])
df_direct_annot.columns = ['Direct_annot']
# Indirect annotation - by motif similarity
motif_similarity_annot = df[df['Annotation'].str.contains('similar') & ~df['Annotation'].str.contains('orthologous')]
motif_similarity_annot = motif_similarity_annot.groupby(['MotifID'])['TF'].apply(lambda x: ', '.join(list(set(x)))).reset_index()
motif_similarity_annot.index = motif_similarity_annot['MotifID']
motif_similarity_annot = pd.DataFrame(motif_similarity_annot['TF'])
motif_similarity_annot.columns = ['Motif_similarity_annot']
# Indirect annotation - by orthology
orthology_annot = df[~df['Annotation'].str.contains('similar') & df['Annotation'].str.contains('orthologous')]
orthology_annot = orthology_annot.groupby(['MotifID'])['TF'].apply(lambda x: ', '.join(list(set(x)))).reset_index()
orthology_annot.index = orthology_annot['MotifID']
orthology_annot = pd.DataFrame(orthology_annot['TF'])
orthology_annot.columns = ['Orthology_annot']
# Indirect annotation - by orthology
motif_similarity_and_orthology_annot = df[df['Annotation'].str.contains('similar') & df['Annotation'].str.contains('orthologous')]
motif_similarity_and_orthology_annot = motif_similarity_and_orthology_annot.groupby(['MotifID'])['TF'].apply(lambda x: ', '.join(list(set(x)))).reset_index()
motif_similarity_and_orthology_annot.index = motif_similarity_and_orthology_annot['MotifID']
motif_similarity_and_orthology_annot = pd.DataFrame(motif_similarity_and_orthology_annot['TF'])
motif_similarity_and_orthology_annot.columns = ['Motif_similarity_and_Orthology_annot']
# Combine
df = pd.concat([df_direct_annot, motif_similarity_annot, orthology_annot, motif_similarity_and_orthology_annot], axis=1, sort=False)
return df I do get back a dataframe with the motif ids annotated: But it seems they are not getting assigned an annotation in the final output. Do you happen to know what am I missing? |
Hi @saketkc ! MotifID in your dataframe is not correct, it should be Cheers! Carmen |
Ahh, that was it. Thanks a lot both! |
Hi Scenic+ Team, I am also getting the same error as described in the above thread. I am following the exact same tutorial as described here and using this :https://scenicplus.readthedocs.io/en/latest/pbmc_multiome_tutorial.html#inferring-enhancer-driven-Gene-Regulatory-Networks-(eGRNs)-using-SCENIC+ and i am using this motif to TF file : motifs-v10nr_clust-nr.hgnc-m0.001-o0.0.tbl available at https://resources.aertslab.org/cistarget/motif2tf/motifs-v10nr_clust-nr.hgnc-m0.001-o0.0.tbl My from scenicplus.wrappers.run_pycistarget import run_pycistarget
run_pycistarget(
region_sets = region_sets,
species = 'homo_sapiens',
save_path = os.path.join(work_dir, 'motifs'),
ctx_db_path = rankings_db,
dem_db_path = scores_db,
path_to_motif_annotations = motif_annotation,
run_without_promoters = True,
n_cpu = 1,
_temp_dir = os.path.join("/data/manke/processing/mirza/tmp/", 'ray_spill'),
annotation_version = 'v10nr_clust',
) when I run now when I run this running this: from scenicplus.wrappers.run_scenicplus import run_scenicplus
try:
run_scenicplus(
scplus_obj = scplus_obj,
variable = ['GEX_celltype'],
species = 'hsapiens',
assembly = 'hg38',
tf_file = '/data/manke/processing/mirza/test_scenic/pbmc_tutorial/data/utoronto_human_tfs_v_1.01.txt',
save_path = os.path.join(work_dir, 'scenicplus'),
biomart_host = biomart_host,
upstream = [1000, 150000],
downstream = [1000, 150000],
calculate_TF_eGRN_correlation = True,
calculate_DEGs_DARs = True,
export_to_loom_file = True,
export_to_UCSC_file = True,
path_bedToBigBed = '/data/manke/processing/mirza/test_scenic/pbmc_tutorial',
n_cpu = 16,
_temp_dir = os.path.join(tmp_dir, 'ray_spill'))
except Exception as e:
#in case of failure, still save the object
dill.dump(scplus_obj, open(os.path.join(work_dir, 'scenicplus/scplus_obj.pkl'), 'wb'), protocol=-1)
raise(e) I get this error: > 2024-01-18 10:27:31,296 SCENIC+_wrapper INFO /data/manke/processing/mirza/test_scenic/pbmc_tutorial/scenicplus folder already exists.
> 2024-01-18 10:27:31,299 SCENIC+_wrapper INFO Merging cistromes
> ---------------------------------------------------------------------------
> ValueError Traceback (most recent call last)
> Cell In[30], line 23
> 20 except Exception as e:
> 21 #in case of failure, still save the object
> 22 dill.dump(scplus_obj, open(os.path.join(work_dir, 'scenicplus/scplus_obj.pkl'), 'wb'), protocol=-1)
> ---> 23 raise(e)
>
> Cell In[30], line 3
> 1 from scenicplus.wrappers.run_scenicplus import run_scenicplus
> 2 try:
> ----> 3 run_scenicplus(
> 4 scplus_obj = scplus_obj,
> 5 variable = ['GEX_celltype'],
> 6 species = 'hsapiens',
> 7 assembly = 'hg38',
> 8 tf_file = '/data/manke/processing/mirza/test_scenic/pbmc_tutorial/data/utoronto_human_tfs_v_1.01.txt',
> 9 save_path = os.path.join(work_dir, 'scenicplus'),
> 10 biomart_host = biomart_host,
> 11 upstream = [1000, 150000],
> 12 downstream = [1000, 150000],
> 13 calculate_TF_eGRN_correlation = True,
> 14 calculate_DEGs_DARs = True,
> 15 export_to_loom_file = True,
> 16 export_to_UCSC_file = True,
> 17 path_bedToBigBed = '/data/manke/processing/mirza/test_scenic/pbmc_tutorial',
> 18 n_cpu = 16,
> 19 _temp_dir = os.path.join(tmp_dir, 'ray_spill'))
> 20 except Exception as e:
> 21 #in case of failure, still save the object
> 22 dill.dump(scplus_obj, open(os.path.join(work_dir, 'scenicplus/scplus_obj.pkl'), 'wb'), protocol=-1)
>
> File ~/localenv/mirza/anaconda/envs/scenicplus/lib/python3.8/site-packages/scenicplus/wrappers/run_scenicplus.py:129, in run_scenicplus(scplus_obj, variable, species, assembly, tf_file, save_path, biomart_host, upstream, downstream, region_ranking, gene_ranking, simplified_eGRN, calculate_TF_eGRN_correlation, calculate_DEGs_DARs, export_to_loom_file, export_to_UCSC_file, tree_structure, path_bedToBigBed, n_cpu, _temp_dir, save_partial, **kwargs)
> 127 if 'Cistromes' not in scplus_obj.uns.keys():
> 128 log.info('Merging cistromes')
> --> 129 merge_cistromes(scplus_obj)
> 132 if 'search_space' not in scplus_obj.uns.keys():
> 133 log.info('Getting search space')
>
> File ~/localenv/mirza/anaconda/envs/scenicplus/lib/python3.8/site-packages/scenicplus/cistromes.py:90, in merge_cistromes(scplus_obj, cistromes_key, subset)
> 87 signatures_extend = {k:v for k,v in signatures_extend.items() if v}
> 89 if len(signatures_direct.keys()) == 0 and len(signatures_extend.keys()) == 0:
> ---> 90 raise ValueError("No cistromes found! Make sure that the motif enrichment results look good!")
> 92 #merge regions by TF name
> 93 if len(signatures_direct.keys()) > 0:
>
> ValueError: No cistromes found! Make sure that the motif enrichment results look good!
> I would be grateful if somebody can help me solve this issue. Thanks a bunch! |
Hi @mairamirza Thanks for using SCENIC+. Can you check wether your motifs are annotated to TFs (in the html files)? In the screenshot you are showing I don't see any annotation (there should be TF names listed), but this might be because the screenshot is cutoff. All the best, Seppe |
Hi @SeppeDeWinter , Thanks for the reply. I have these following html files in my motif directory.
My B_cells1.html and Topic6.html files looks like these respectively I am not sure if you are referring to these files. I am new to this, could you please guide me further. Thanks. |
Hi @mairamirza Indeed, these are the correct files. However the motif-to-TF annotation is missing (at least one of the following columns should be present: Can you check wether you are using the correct motif-to-TF annotation file? All the best, Seppe |
Describe the bug
I am trying to run scenicplus on a Pig dataset. I have generated motif scores following the steps in
create_cisTarget_databases
. I had to make multiple changes in the codebase to add support for this custom workflow, but I am running into issues with
merge_cistromes` step.run_pycistarget
goes through without issues and creates cistromes:The DEM hits make sense:
However, at the
merge_cistromes(scplus_obj)
step, I get an errorNo cistromes found! Make sure that the motif enrichment results look good!
To Reproduce
I am not sure how to provide a reproducible example, but if you could suggest a workflow for running Scenic+ on custom organisms, that would be wonderful! It is quite possible that I am missing something obvious (possibly because I had to change the code to handle a new motif database).
Error output
Version (please complete the following information):
Python: 3.7.10
SCENIC+: '0.1.dev445+g615c88b.d20220902'
If a bug is related to another module [e.g. matplotlib 3.3.9]
Additional context
Add any other context about the problem here.
The text was updated successfully, but these errors were encountered: