Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

issue running cistarget using human dataset #87

Open
sid5427 opened this issue Jan 10, 2023 · 12 comments
Open

issue running cistarget using human dataset #87

sid5427 opened this issue Jan 10, 2023 · 12 comments
Labels
question Further information is requested

Comments

@sid5427
Copy link

sid5427 commented Jan 10, 2023

Hi Seppe and other devs - Happy new year!

I am unfortunately facing another issue while running scenic with our human 10x multiome data.

When I run pycistarget it's throwing an error - "ValueError: A gene signature must have at least one gene."

Python version - 3.8.13
Scenic version - (not sure .. Updated to latest version on december 30th - version returns - AttributeError: module 'scenicplus' has no attribute 'version' <- might want to check this as well.)

This is how I am setting up pycistarget to run ...

rankings_db = 'data/hg38_screen_v10_clust.regions_vs_motifs.rankings.feather'
scores_db =  'data/hg38_screen_v10_clust.regions_vs_motifs.scores.feather'
motif_annotation = 'data/motifs-v10nr_clust-nr.hgnc-m0.001-o0.0.tbl' 

##create paths for enriched motifs
if not os.path.exists('results/motifs'):
    os.makedirs('results/motifs')

from scenicplus.wrappers.run_pycistarget import run_pycistarget
run_pycistarget(
    region_sets = region_sets,
    species = 'homo_sapiens',
    save_path = 'results/motifs',
    ctx_db_path = rankings_db,
    dem_db_path = scores_db,
    path_to_motif_annotations = motif_annotation,
    #run_without_promoters = True,
    n_cpu = 8,
    _temp_dir = '/users/sen2qb/symlinks/temp_d_d/ray_spill',
    annotation_version = 'v10nr_clust',
    )

output log -

2023-01-09 16:50:26,509 pycisTarget_wrapper INFO     results/motifs folder already exists.
2023-01-09 16:50:28,061 pycisTarget_wrapper INFO     Loading cisTarget database for topics_otsu
2023-01-09 16:50:28,063 cisTarget    INFO     Reading cisTarget database
2023-01-09 16:58:16,508 pycisTarget_wrapper INFO     Running cisTarget for topics_otsu

2023-01-09 16:58:35,199	INFO worker.py:1509 -- Started a local Ray instance. View the dashboard at http://127.0.0.1:8265 

(ctx_internal_ray pid=135215) 2023-01-09 16:58:57,165 cisTarget    INFO     Running cisTarget for Topic1 which has 2683 regions
(ctx_internal_ray pid=135214) 2023-01-09 16:58:57,254 cisTarget    INFO     Running cisTarget for Topic2 which has 7472 regions
(ctx_internal_ray pid=135216) 2023-01-09 16:58:58,224 cisTarget    INFO     Running cisTarget for Topic3 which has 7003 regions
(ctx_internal_ray pid=135212) 2023-01-09 16:58:59,248 cisTarget    INFO     Running cisTarget for Topic4 which has 8008 regions
(ctx_internal_ray pid=135219) 2023-01-09 16:58:59,772 cisTarget    INFO     Running cisTarget for Topic5 which has 4396 regions
(ctx_internal_ray pid=135218) 2023-01-09 16:59:00,092 cisTarget    INFO     Running cisTarget for Topic6 which has 690 regions
(ctx_internal_ray pid=135217) 2023-01-09 16:59:00,544 cisTarget    INFO     Running cisTarget for Topic7 which has 871 regions
(ctx_internal_ray pid=135213) 2023-01-09 16:59:00,939 cisTarget    INFO     Running cisTarget for Topic8 which has 1500 regions
(ctx_internal_ray pid=135214) 2023-01-09 16:59:20,625 cisTarget    INFO     Annotating motifs for Topic2
(ctx_internal_ray pid=135214) 2023-01-09 16:59:23,420 cisTarget    INFO     Getting cistromes for Topic2
(ctx_internal_ray pid=135214) 2023-01-09 16:59:24,558 cisTarget    INFO     Running cisTarget for Topic9 which has 2254 regions
(ctx_internal_ray pid=135218) 2023-01-09 16:59:25,580 cisTarget    INFO     Annotating motifs for Topic6
(ctx_internal_ray pid=135219) 2023-01-09 16:59:26,070 cisTarget    INFO     Annotating motifs for Topic5
(ctx_internal_ray pid=135218) 2023-01-09 16:59:27,338 cisTarget    INFO     Getting cistromes for Topic6
(ctx_internal_ray pid=135218) 2023-01-09 16:59:27,664 cisTarget    INFO     Running cisTarget for Topic10 which has 2709 regions
(ctx_internal_ray pid=135219) 2023-01-09 16:59:28,210 cisTarget    INFO     Getting cistromes for Topic5
(ctx_internal_ray pid=135216) 2023-01-09 16:59:28,323 cisTarget    INFO     Annotating motifs for Topic3
(ctx_internal_ray pid=135215) 2023-01-09 16:59:28,390 cisTarget    INFO     Annotating motifs for Topic1
(ctx_internal_ray pid=135217) 2023-01-09 16:59:28,591 cisTarget    INFO     Annotating motifs for Topic7
(ctx_internal_ray pid=135219) 2023-01-09 16:59:29,130 cisTarget    INFO     Running cisTarget for Topic11 which has 3502 regions
(ctx_internal_ray pid=135215) 2023-01-09 16:59:30,299 cisTarget    INFO     Getting cistromes for Topic1
(ctx_internal_ray pid=135217) 2023-01-09 16:59:30,303 cisTarget    INFO     Getting cistromes for Topic7
(ctx_internal_ray pid=135216) 2023-01-09 16:59:30,495 cisTarget    INFO     Getting cistromes for Topic3
(ctx_internal_ray pid=135217) 2023-01-09 16:59:30,505 cisTarget    INFO     Running cisTarget for Topic12 which has 6396 regions
(ctx_internal_ray pid=135215) 2023-01-09 16:59:31,016 cisTarget    INFO     Running cisTarget for Topic13 which has 1821 regions
(ctx_internal_ray pid=135216) 2023-01-09 16:59:31,509 cisTarget    INFO     Running cisTarget for Topic14 which has 6555 regions
(ctx_internal_ray pid=135213) 2023-01-09 16:59:34,557 cisTarget    INFO     Annotating motifs for Topic8
(ctx_internal_ray pid=135212) 2023-01-09 16:59:36,161 cisTarget    INFO     Annotating motifs for Topic4
(ctx_internal_ray pid=135213) 2023-01-09 16:59:36,496 cisTarget    INFO     Getting cistromes for Topic8
(ctx_internal_ray pid=135213) 2023-01-09 16:59:37,284 cisTarget    INFO     Running cisTarget for Topic15 which has 2558 regions
(ctx_internal_ray pid=135212) 2023-01-09 16:59:38,378 cisTarget    INFO     Getting cistromes for Topic4
(ctx_internal_ray pid=135212) 2023-01-09 16:59:39,515 cisTarget    INFO     Running cisTarget for Topic16 which has 4249 regions
(ctx_internal_ray pid=135214) 2023-01-09 16:59:49,186 cisTarget    INFO     Annotating motifs for Topic9
(ctx_internal_ray pid=135214) 2023-01-09 16:59:51,090 cisTarget    INFO     Getting cistromes for Topic9
(ctx_internal_ray pid=135214) 2023-01-09 16:59:51,626 cisTarget    INFO     Running cisTarget for Topic17 which has 3641 regions
(ctx_internal_ray pid=135218) 2023-01-09 16:59:52,462 cisTarget    INFO     Annotating motifs for Topic10
(ctx_internal_ray pid=135218) 2023-01-09 16:59:54,468 cisTarget    INFO     Getting cistromes for Topic10
(ctx_internal_ray pid=135218) 2023-01-09 16:59:55,062 cisTarget    INFO     Running cisTarget for Topic18 which has 3115 regions
(ctx_internal_ray pid=135215) 2023-01-09 16:59:58,649 cisTarget    INFO     Annotating motifs for Topic13
(ctx_internal_ray pid=135219) 2023-01-09 16:59:59,027 cisTarget    INFO     Annotating motifs for Topic11
(ctx_internal_ray pid=135215) 2023-01-09 17:00:00,584 cisTarget    INFO     Getting cistromes for Topic13
(ctx_internal_ray pid=135219) 2023-01-09 17:00:00,979 cisTarget    INFO     Getting cistromes for Topic11
(ctx_internal_ray pid=135215) 2023-01-09 17:00:01,249 cisTarget    INFO     Running cisTarget for Topic19 which has 4491 regions
(ctx_internal_ray pid=135217) 2023-01-09 17:00:01,349 cisTarget    INFO     Annotating motifs for Topic12
(ctx_internal_ray pid=135219) 2023-01-09 17:00:01,656 cisTarget    INFO     Running cisTarget for Topic20 which has 5425 regions
(ctx_internal_ray pid=135217) 2023-01-09 17:00:03,554 cisTarget    INFO     Getting cistromes for Topic12
(ctx_internal_ray pid=135216) 2023-01-09 17:00:03,891 cisTarget    INFO     Annotating motifs for Topic14
(ctx_internal_ray pid=135217) 2023-01-09 17:00:04,672 cisTarget    INFO     Running cisTarget for Topic21 which has 2658 regions
(ctx_internal_ray pid=135213) 2023-01-09 17:00:05,961 cisTarget    INFO     Annotating motifs for Topic15
(ctx_internal_ray pid=135216) 2023-01-09 17:00:06,166 cisTarget    INFO     Getting cistromes for Topic14
(ctx_internal_ray pid=135216) 2023-01-09 17:00:07,369 cisTarget    INFO     Running cisTarget for Topic22 which has 4600 regions
(ctx_internal_ray pid=135212) 2023-01-09 17:00:07,806 cisTarget    INFO     Annotating motifs for Topic16
(ctx_internal_ray pid=135213) 2023-01-09 17:00:07,982 cisTarget    INFO     Getting cistromes for Topic15
(ctx_internal_ray pid=135213) 2023-01-09 17:00:08,768 cisTarget    INFO     Running cisTarget for Topic23 which has 3878 regions
(ctx_internal_ray pid=135212) 2023-01-09 17:00:09,889 cisTarget    INFO     Getting cistromes for Topic16
(ctx_internal_ray pid=135212) 2023-01-09 17:00:10,822 cisTarget    INFO     Running cisTarget for Topic24 which has 6484 regions
(ctx_internal_ray pid=135214) 2023-01-09 17:00:20,196 cisTarget    INFO     Annotating motifs for Topic17
(ctx_internal_ray pid=135218) 2023-01-09 17:00:21,711 cisTarget    INFO     Annotating motifs for Topic18
(ctx_internal_ray pid=135214) 2023-01-09 17:00:22,376 cisTarget    INFO     Getting cistromes for Topic17
(ctx_internal_ray pid=135214) 2023-01-09 17:00:23,530 cisTarget    INFO     Running cisTarget for Topic25 which has 3604 regions
(ctx_internal_ray pid=135218) 2023-01-09 17:00:24,000 cisTarget    INFO     Getting cistromes for Topic18
(ctx_internal_ray pid=135218) 2023-01-09 17:00:25,434 cisTarget    INFO     Running cisTarget for Topic26 which has 6203 regions
(ctx_internal_ray pid=135219) 2023-01-09 17:00:29,809 cisTarget    INFO     Annotating motifs for Topic20
(ctx_internal_ray pid=135215) 2023-01-09 17:00:30,175 cisTarget    INFO     Annotating motifs for Topic19
(ctx_internal_ray pid=135219) 2023-01-09 17:00:32,346 cisTarget    INFO     Getting cistromes for Topic20
(ctx_internal_ray pid=135215) 2023-01-09 17:00:32,483 cisTarget    INFO     Getting cistromes for Topic19
(ctx_internal_ray pid=135217) 2023-01-09 17:00:32,618 cisTarget    INFO     Annotating motifs for Topic21
(ctx_internal_ray pid=135217) 2023-01-09 17:00:34,519 cisTarget    INFO     Getting cistromes for Topic21
(ctx_internal_ray pid=135216) 2023-01-09 17:00:35,464 cisTarget    INFO     Annotating motifs for Topic22
(ctx_internal_ray pid=135216) 2023-01-09 17:00:37,579 cisTarget    INFO     Getting cistromes for Topic22
(ctx_internal_ray pid=135213) 2023-01-09 17:00:38,689 cisTarget    INFO     Annotating motifs for Topic23
(ctx_internal_ray pid=135213) 2023-01-09 17:00:40,970 cisTarget    INFO     Getting cistromes for Topic23
(ctx_internal_ray pid=135212) 2023-01-09 17:00:41,362 cisTarget    INFO     Annotating motifs for Topic24
(ctx_internal_ray pid=135212) 2023-01-09 17:00:43,648 cisTarget    INFO     Getting cistromes for Topic24
(ctx_internal_ray pid=135214) 2023-01-09 17:00:47,327 cisTarget    INFO     Annotating motifs for Topic25
(ctx_internal_ray pid=135218) 2023-01-09 17:00:47,488 cisTarget    INFO     Annotating motifs for Topic26
(ctx_internal_ray pid=135214) 2023-01-09 17:00:49,153 cisTarget    INFO     Getting cistromes for Topic25
(ctx_internal_ray pid=135218) 2023-01-09 17:00:49,796 cisTarget    INFO     Getting cistromes for Topic26
2023-01-09 17:00:56,898 cisTarget    INFO     Done!
2023-01-09 17:00:56,903 pycisTarget_wrapper INFO     Created folder : results/motifs/CTX_topics_otsu_All
2023-01-09 17:00:57,533 pycisTarget_wrapper INFO     Running DEM for topics_otsu
2023-01-09 17:00:57,535 DEM          INFO     Reading DEM database
2023-01-09 17:05:54,352 DEM          INFO     Creating contrast groups

2023-01-09 17:06:23,981	INFO worker.py:1509 -- Started a local Ray instance. View the dashboard at http://127.0.0.1:8265 

(DEM_internal_ray pid=136708) 2023-01-09 17:06:50,842 DEM          INFO     Computing DEM for Topic1
(DEM_internal_ray pid=136712) 2023-01-09 17:06:52,131 DEM          INFO     Computing DEM for Topic2
(DEM_internal_ray pid=136713) 2023-01-09 17:06:52,434 DEM          INFO     Computing DEM for Topic6
(DEM_internal_ray pid=136711) 2023-01-09 17:06:52,647 DEM          INFO     Computing DEM for Topic7
(DEM_internal_ray pid=136710) 2023-01-09 17:06:52,784 DEM          INFO     Computing DEM for Topic3
(DEM_internal_ray pid=136709) 2023-01-09 17:06:52,917 DEM          INFO     Computing DEM for Topic8
(DEM_internal_ray pid=136714) 2023-01-09 17:06:53,228 DEM          INFO     Computing DEM for Topic5
(DEM_internal_ray pid=136707) 2023-01-09 17:06:53,503 DEM          INFO     Computing DEM for Topic4
(DEM_internal_ray pid=136708) 2023-01-09 17:06:57,700 DEM          INFO     Computing DEM for Topic9
(DEM_internal_ray pid=136710) 2023-01-09 17:06:58,965 DEM          INFO     Computing DEM for Topic10
(DEM_internal_ray pid=136707) 2023-01-09 17:07:00,126 DEM          INFO     Computing DEM for Topic11
(DEM_internal_ray pid=136709) 2023-01-09 17:07:00,839 DEM          INFO     Computing DEM for Topic13
(DEM_internal_ray pid=136712) 2023-01-09 17:07:01,429 DEM          INFO     Computing DEM for Topic12
(DEM_internal_ray pid=136714) 2023-01-09 17:07:02,329 DEM          INFO     Computing DEM for Topic14
(DEM_internal_ray pid=136713) 2023-01-09 17:07:03,336 DEM          INFO     Computing DEM for Topic15
(DEM_internal_ray pid=136711) 2023-01-09 17:07:04,646 DEM          INFO     Computing DEM for Topic16
(DEM_internal_ray pid=136707) 2023-01-09 17:07:06,136 DEM          INFO     Computing DEM for Topic17
(DEM_internal_ray pid=136712) 2023-01-09 17:07:07,656 DEM          INFO     Computing DEM for Topic18
(DEM_internal_ray pid=136714) 2023-01-09 17:07:09,169 DEM          INFO     Computing DEM for Topic19
(DEM_internal_ray pid=136708) 2023-01-09 17:07:15,762 DEM          INFO     Computing DEM for Topic20
(DEM_internal_ray pid=136707) 2023-01-09 17:07:15,929 DEM          INFO     Computing DEM for Topic21
(DEM_internal_ray pid=136709) 2023-01-09 17:07:18,049 DEM          INFO     Computing DEM for Topic22
(DEM_internal_ray pid=136711) 2023-01-09 17:07:21,557 DEM          INFO     Computing DEM for Topic23
(DEM_internal_ray pid=136714) 2023-01-09 17:07:23,525 DEM          INFO     Computing DEM for Topic24
(DEM_internal_ray pid=136709) 2023-01-09 17:07:24,388 DEM          INFO     Computing DEM for Topic25
(DEM_internal_ray pid=136707) 2023-01-09 17:07:26,658 DEM          INFO     Computing DEM for Topic26
2023-01-09 17:07:49,100 DEM          INFO     Forming cistromes
2023-01-09 17:07:59,043 DEM          INFO     Done!
2023-01-09 17:08:04,669 pycisTarget_wrapper INFO     Created folder : results/motifs/DEM_topics_otsu_All
2023-01-09 17:08:05,487 pycisTarget_wrapper INFO     Loading cisTarget database for topics_top_3
2023-01-09 17:08:05,488 cisTarget    INFO     Reading cisTarget database
2023-01-09 17:11:17,749 pycisTarget_wrapper INFO     Running cisTarget for topics_top_3

2023-01-09 17:11:27,812	INFO worker.py:1509 -- Started a local Ray instance. View the dashboard at http://127.0.0.1:8265 

(ctx_internal_ray pid=138080) 2023-01-09 17:11:50,897 cisTarget    INFO     Running cisTarget for Topic1 which has 3269 regions
(ctx_internal_ray pid=138077) 2023-01-09 17:11:51,040 cisTarget    INFO     Running cisTarget for Topic2 which has 3595 regions
(ctx_internal_ray pid=138078) 2023-01-09 17:11:52,108 cisTarget    INFO     Running cisTarget for Topic3 which has 3678 regions
(ctx_internal_ray pid=138079) 2023-01-09 17:11:52,669 cisTarget    INFO     Running cisTarget for Topic4 which has 3790 regions
(ctx_internal_ray pid=138074) 2023-01-09 17:11:53,292 cisTarget    INFO     Running cisTarget for Topic5 which has 3428 regions
(ctx_internal_ray pid=138075) 2023-01-09 17:11:53,748 cisTarget    INFO     Running cisTarget for Topic6 which has 3543 regions
(ctx_internal_ray pid=138073) 2023-01-09 17:11:54,196 cisTarget    INFO     Running cisTarget for Topic7 which has 3630 regions
(ctx_internal_ray pid=138076) 2023-01-09 17:11:54,286 cisTarget    INFO     Running cisTarget for Topic8 which has 3832 regions
(ctx_internal_ray pid=138077) 2023-01-09 17:12:12,492 cisTarget    INFO     Annotating motifs for Topic2
(ctx_internal_ray pid=138077) 2023-01-09 17:12:14,462 cisTarget    INFO     Getting cistromes for Topic2
(ctx_internal_ray pid=138077) 2023-01-09 17:12:15,317 cisTarget    INFO     Running cisTarget for Topic9 which has 3372 regions
(ctx_internal_ray pid=138075) 2023-01-09 17:12:17,210 cisTarget    INFO     Annotating motifs for Topic6
(ctx_internal_ray pid=138080) 2023-01-09 17:12:18,413 cisTarget    INFO     Annotating motifs for Topic1
(ctx_internal_ray pid=138075) 2023-01-09 17:12:19,137 cisTarget    INFO     Getting cistromes for Topic6
(ctx_internal_ray pid=138074) 2023-01-09 17:12:19,815 cisTarget    INFO     Annotating motifs for Topic5
(ctx_internal_ray pid=138075) 2023-01-09 17:12:19,728 cisTarget    INFO     Running cisTarget for Topic10 which has 3470 regions
(ctx_internal_ray pid=138080) 2023-01-09 17:12:20,193 cisTarget    INFO     Getting cistromes for Topic1
(ctx_internal_ray pid=138080) 2023-01-09 17:12:20,920 cisTarget    INFO     Running cisTarget for Topic11 which has 3553 regions
(ctx_internal_ray pid=138074) 2023-01-09 17:12:21,751 cisTarget    INFO     Getting cistromes for Topic5
(ctx_internal_ray pid=138073) 2023-01-09 17:12:21,883 cisTarget    INFO     Annotating motifs for Topic7
(ctx_internal_ray pid=138074) 2023-01-09 17:12:22,414 cisTarget    INFO     Running cisTarget for Topic12 which has 3981 regions
(ctx_internal_ray pid=138079) 2023-01-09 17:12:22,490 cisTarget    INFO     Annotating motifs for Topic4
(ctx_internal_ray pid=138076) 2023-01-09 17:12:23,682 cisTarget    INFO     Annotating motifs for Topic8
(ctx_internal_ray pid=138078) 2023-01-09 17:12:23,775 cisTarget    INFO     Annotating motifs for Topic3
(ctx_internal_ray pid=138073) 2023-01-09 17:12:23,737 cisTarget    INFO     Getting cistromes for Topic7
(ctx_internal_ray pid=138073) 2023-01-09 17:12:24,313 cisTarget    INFO     Running cisTarget for Topic13 which has 3447 regions
(ctx_internal_ray pid=138079) 2023-01-09 17:12:24,425 cisTarget    INFO     Getting cistromes for Topic4
(ctx_internal_ray pid=138079) 2023-01-09 17:12:25,120 cisTarget    INFO     Running cisTarget for Topic14 which has 3970 regions
(ctx_internal_ray pid=138076) 2023-01-09 17:12:25,614 cisTarget    INFO     Getting cistromes for Topic8
(ctx_internal_ray pid=138078) 2023-01-09 17:12:25,722 cisTarget    INFO     Getting cistromes for Topic3
(ctx_internal_ray pid=138078) 2023-01-09 17:12:26,402 cisTarget    INFO     Running cisTarget for Topic15 which has 3276 regions
(ctx_internal_ray pid=138076) 2023-01-09 17:12:26,663 cisTarget    INFO     Running cisTarget for Topic16 which has 3415 regions
(ctx_internal_ray pid=138077) 2023-01-09 17:12:30,842 cisTarget    INFO     Annotating motifs for Topic9
(ctx_internal_ray pid=138077) 2023-01-09 17:12:32,808 cisTarget    INFO     Getting cistromes for Topic9
(ctx_internal_ray pid=138077) 2023-01-09 17:12:33,499 cisTarget    INFO     Running cisTarget for Topic17 which has 3731 regions
(ctx_internal_ray pid=138080) 2023-01-09 17:12:40,586 cisTarget    INFO     Annotating motifs for Topic11
(ctx_internal_ray pid=138080) 2023-01-09 17:12:42,343 cisTarget    INFO     Getting cistromes for Topic11
(ctx_internal_ray pid=138080) 2023-01-09 17:12:42,915 cisTarget    INFO     Running cisTarget for Topic18 which has 3838 regions
(ctx_internal_ray pid=138075) 2023-01-09 17:12:43,710 cisTarget    INFO     Annotating motifs for Topic10
(ctx_internal_ray pid=138075) 2023-01-09 17:12:45,779 cisTarget    INFO     Getting cistromes for Topic10
(ctx_internal_ray pid=138075) 2023-01-09 17:12:46,448 cisTarget    INFO     Running cisTarget for Topic19 which has 3817 regions
(ctx_internal_ray pid=138076) 2023-01-09 17:12:47,486 cisTarget    INFO     Annotating motifs for Topic16
(ctx_internal_ray pid=138076) 2023-01-09 17:12:49,396 cisTarget    INFO     Getting cistromes for Topic16
(ctx_internal_ray pid=138076) 2023-01-09 17:12:50,159 cisTarget    INFO     Running cisTarget for Topic20 which has 4003 regions
(ctx_internal_ray pid=138073) 2023-01-09 17:12:50,711 cisTarget    INFO     Annotating motifs for Topic13
(ctx_internal_ray pid=138074) 2023-01-09 17:12:51,042 cisTarget    INFO     Annotating motifs for Topic12
(ctx_internal_ray pid=138073) 2023-01-09 17:12:52,745 cisTarget    INFO     Getting cistromes for Topic13
(ctx_internal_ray pid=138079) 2023-01-09 17:12:52,949 cisTarget    INFO     Annotating motifs for Topic14
(ctx_internal_ray pid=138074) 2023-01-09 17:12:53,047 cisTarget    INFO     Getting cistromes for Topic12
(ctx_internal_ray pid=138073) 2023-01-09 17:12:53,570 cisTarget    INFO     Running cisTarget for Topic21 which has 3309 regions
(ctx_internal_ray pid=138074) 2023-01-09 17:12:53,893 cisTarget    INFO     Running cisTarget for Topic22 which has 3774 regions
(ctx_internal_ray pid=138079) 2023-01-09 17:12:54,977 cisTarget    INFO     Getting cistromes for Topic14
(ctx_internal_ray pid=138079) 2023-01-09 17:12:55,752 cisTarget    INFO     Running cisTarget for Topic23 which has 3997 regions
(ctx_internal_ray pid=138078) 2023-01-09 17:12:56,224 cisTarget    INFO     Annotating motifs for Topic15
(ctx_internal_ray pid=138078) 2023-01-09 17:12:58,306 cisTarget    INFO     Getting cistromes for Topic15
(ctx_internal_ray pid=138078) 2023-01-09 17:12:59,182 cisTarget    INFO     Running cisTarget for Topic24 which has 3870 regions
(ctx_internal_ray pid=138077) 2023-01-09 17:12:59,841 cisTarget    INFO     Annotating motifs for Topic17
(ctx_internal_ray pid=138077) 2023-01-09 17:13:02,071 cisTarget    INFO     Getting cistromes for Topic17
(ctx_internal_ray pid=138077) 2023-01-09 17:13:03,306 cisTarget    INFO     Running cisTarget for Topic25 which has 3465 regions
(ctx_internal_ray pid=138080) 2023-01-09 17:13:04,487 cisTarget    INFO     Annotating motifs for Topic18
(ctx_internal_ray pid=138080) 2023-01-09 17:13:06,924 cisTarget    INFO     Getting cistromes for Topic18
(ctx_internal_ray pid=138075) 2023-01-09 17:13:07,678 cisTarget    INFO     Annotating motifs for Topic19
(ctx_internal_ray pid=138080) 2023-01-09 17:13:08,587 cisTarget    INFO     Running cisTarget for Topic26 which has 3928 regions
(ctx_internal_ray pid=138075) 2023-01-09 17:13:09,932 cisTarget    INFO     Getting cistromes for Topic19
(ctx_internal_ray pid=138076) 2023-01-09 17:13:12,149 cisTarget    INFO     Annotating motifs for Topic20
(ctx_internal_ray pid=138076) 2023-01-09 17:13:14,262 cisTarget    INFO     Getting cistromes for Topic20
(ctx_internal_ray pid=138074) 2023-01-09 17:13:16,666 cisTarget    INFO     Annotating motifs for Topic22
(ctx_internal_ray pid=138073) 2023-01-09 17:13:16,911 cisTarget    INFO     Annotating motifs for Topic21
(ctx_internal_ray pid=138074) 2023-01-09 17:13:18,713 cisTarget    INFO     Getting cistromes for Topic22
(ctx_internal_ray pid=138073) 2023-01-09 17:13:18,931 cisTarget    INFO     Getting cistromes for Topic21
(ctx_internal_ray pid=138079) 2023-01-09 17:13:21,074 cisTarget    INFO     Annotating motifs for Topic23
(ctx_internal_ray pid=138079) 2023-01-09 17:13:23,286 cisTarget    INFO     Getting cistromes for Topic23
(ctx_internal_ray pid=138077) 2023-01-09 17:13:23,652 cisTarget    INFO     Annotating motifs for Topic25
(ctx_internal_ray pid=138078) 2023-01-09 17:13:24,470 cisTarget    INFO     Annotating motifs for Topic24
(ctx_internal_ray pid=138077) 2023-01-09 17:13:25,550 cisTarget    INFO     Getting cistromes for Topic25
(ctx_internal_ray pid=138078) 2023-01-09 17:13:26,409 cisTarget    INFO     Getting cistromes for Topic24
(ctx_internal_ray pid=138080) 2023-01-09 17:13:29,032 cisTarget    INFO     Annotating motifs for Topic26
(ctx_internal_ray pid=138080) 2023-01-09 17:13:31,048 cisTarget    INFO     Getting cistromes for Topic26
2023-01-09 17:13:36,700 cisTarget    INFO     Done!
2023-01-09 17:13:36,706 pycisTarget_wrapper INFO     Created folder : results/motifs/CTX_topics_top_3_All
2023-01-09 17:13:37,328 pycisTarget_wrapper INFO     Running DEM for topics_top_3
2023-01-09 17:13:37,330 DEM          INFO     Reading DEM database
2023-01-09 17:16:24,839 DEM          INFO     Creating contrast groups

2023-01-09 17:16:55,424	INFO worker.py:1509 -- Started a local Ray instance. View the dashboard at http://127.0.0.1:8265 

(DEM_internal_ray pid=139390) 2023-01-09 17:17:21,269 DEM          INFO     Computing DEM for Topic1
(DEM_internal_ray pid=139389) 2023-01-09 17:17:21,557 DEM          INFO     Computing DEM for Topic2
(DEM_internal_ray pid=139388) 2023-01-09 17:17:22,012 DEM          INFO     Computing DEM for Topic3
(DEM_internal_ray pid=139387) 2023-01-09 17:17:22,777 DEM          INFO     Computing DEM for Topic4
(DEM_internal_ray pid=139391) 2023-01-09 17:17:22,720 DEM          INFO     Computing DEM for Topic5
(DEM_internal_ray pid=139384) 2023-01-09 17:17:22,802 DEM          INFO     Computing DEM for Topic6
(DEM_internal_ray pid=139386) 2023-01-09 17:17:23,193 DEM          INFO     Computing DEM for Topic7
(DEM_internal_ray pid=139385) 2023-01-09 17:17:23,242 DEM          INFO     Computing DEM for Topic8
(DEM_internal_ray pid=139388) 2023-01-09 17:17:27,334 DEM          INFO     Computing DEM for Topic9
(DEM_internal_ray pid=139387) 2023-01-09 17:17:28,507 DEM          INFO     Computing DEM for Topic10
(DEM_internal_ray pid=139390) 2023-01-09 17:17:28,694 DEM          INFO     Computing DEM for Topic11
(DEM_internal_ray pid=139385) 2023-01-09 17:17:29,732 DEM          INFO     Computing DEM for Topic12
(DEM_internal_ray pid=139391) 2023-01-09 17:17:31,412 DEM          INFO     Computing DEM for Topic13
(DEM_internal_ray pid=139384) 2023-01-09 17:17:31,561 DEM          INFO     Computing DEM for Topic14
(DEM_internal_ray pid=139389) 2023-01-09 17:17:32,458 DEM          INFO     Computing DEM for Topic15
(DEM_internal_ray pid=139390) 2023-01-09 17:17:34,263 DEM          INFO     Computing DEM for Topic16
(DEM_internal_ray pid=139386) 2023-01-09 17:17:34,882 DEM          INFO     Computing DEM for Topic17
(DEM_internal_ray pid=139385) 2023-01-09 17:17:35,623 DEM          INFO     Computing DEM for Topic18
(DEM_internal_ray pid=139384) 2023-01-09 17:17:37,134 DEM          INFO     Computing DEM for Topic19
(DEM_internal_ray pid=139387) 2023-01-09 17:17:45,269 DEM          INFO     Computing DEM for Topic20
(DEM_internal_ray pid=139386) 2023-01-09 17:17:45,445 DEM          INFO     Computing DEM for Topic21
(DEM_internal_ray pid=139384) 2023-01-09 17:17:46,820 DEM          INFO     Computing DEM for Topic22
(DEM_internal_ray pid=139390) 2023-01-09 17:17:48,670 DEM          INFO     Computing DEM for Topic23
(DEM_internal_ray pid=139391) 2023-01-09 17:17:49,197 DEM          INFO     Computing DEM for Topic24
(DEM_internal_ray pid=139388) 2023-01-09 17:17:49,289 DEM          INFO     Computing DEM for Topic25
(DEM_internal_ray pid=139384) 2023-01-09 17:17:53,272 DEM          INFO     Computing DEM for Topic26
2023-01-09 17:18:11,647 DEM          INFO     Forming cistromes
2023-01-09 17:18:18,833 DEM          INFO     Done!
2023-01-09 17:18:23,679 pycisTarget_wrapper INFO     Created folder : results/motifs/DEM_topics_top_3_All
2023-01-09 17:18:24,360 pycisTarget_wrapper INFO     Loading cisTarget database for DARs
2023-01-09 17:18:24,362 cisTarget    INFO     Reading cisTarget database

---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
Cell In[77], line 2
      1 from scenicplus.wrappers.run_pycistarget import run_pycistarget
----> 2 run_pycistarget(
      3     region_sets = region_sets,
      4     species = 'homo_sapiens',
      5     save_path = 'results/motifs',
      6     ctx_db_path = rankings_db,
      7     dem_db_path = scores_db,
      8     path_to_motif_annotations = motif_annotation,
      9     #run_without_promoters = True,
     10     n_cpu = 8,
     11     _temp_dir = '/users/sen2qb/symlinks/temp_d_d/ray_spill',
     12     annotation_version = 'v10nr_clust',
     13     )

File ~/testing_area/scenicplus/src/scenicplus/wrappers/run_pycistarget.py:182, in run_pycistarget(region_sets, species, save_path, custom_annot, save_partial, ctx_db_path, dem_db_path, run_without_promoters, biomart_host, promoter_space, ctx_auc_threshold, ctx_nes_threshold, ctx_rank_threshold, dem_log2fc_thr, dem_motif_hit_thr, dem_max_bg_regions, annotation, motif_similarity_fdr, path_to_motif_annotations, annotation_version, n_cpu, _temp_dir, exclude_motifs, exclude_collection, **kwargs)
    180 ## CISTARGET
    181 regions = region_sets[key]
--> 182 ctx_db = cisTargetDatabase(ctx_db_path, regions)  
    183 if exclude_motifs is not None:
    184     out = pd.read_csv(exclude_motifs, header=None).iloc[:,0].tolist()

File ~/testing_area/pycistarget/pycistarget/motif_enrichment_cistarget.py:67, in cisTargetDatabase.__init__(self, fname, region_sets, name, fraction_overlap)
     48 def __init__(self, 
     49             fname: str,
     50             region_sets: Union[Dict[str, pr.PyRanges], pr.PyRanges] = None,
     51             name: str = None,
     52             fraction_overlap: float = 0.4):
     53     """
     54     Initialize cisTargetDatabase
     55     
   (...)
     65         Minimal overlap between query and regions in the database for the mapping.     
     66     """
---> 67     self.regions_to_db, self.db_rankings, self.total_regions = self.load_db(fname,
     68                                                       region_sets,
     69                                                       name,
     70                                                       fraction_overlap)

File ~/testing_area/pycistarget/pycistarget/motif_enrichment_cistarget.py:131, in cisTargetDatabase.load_db(self, fname, region_sets, name, fraction_overlap)
    129 if prefix is not None:
    130     target_regions_in_db = [prefix + '__' + x for x in target_regions_in_db]
--> 131 target_regions_in_db = GeneSignature(name=name, gene2weight=target_regions_in_db)
    132 db_rankings = db.load(target_regions_in_db)
    133 if prefix is not None:

File <attrs generated init ctxcore.genesig.GeneSignature>:8, in __init__(self, name, gene2weight)
      6 if _config._run_validators is True:
      7     __attr_validator_name(self, __attr_name, self.name)
----> 8     __attr_validator_gene2weight(self, __attr_gene2weight, self.gene2weight)

File ~/.conda/envs/py_3_8/lib/python3.8/site-packages/ctxcore/genesig.py:172, in GeneSignature.gene2weight_validator(self, attribute, value)
    169 @gene2weight.validator
    170 def gene2weight_validator(self, attribute, value) -> None:
    171     if len(value) == 0:
--> 172         raise ValueError("A gene signature must have at least one gene.")

ValueError: A gene signature must have at least one gene.

I looked at other error reports- namely - #60

and tried the same command with "save partial = TRUE" and "run without promoters - TRUE"

I get the same error - error log

Would it be possible to create the scenic object and run the scenic+ function with a couple of the partial result pickle files e.g. CTX_topics_otsu_All.pkl instead of menr.pkl?

Thanks!
Sid.

@sid5427 sid5427 added the question Further information is requested label Jan 10, 2023
@SeppeDeWinter
Copy link
Collaborator

Hi @sid5427

From the error I suspect that region_sets['DARs'] might be empty or contain empty entries.
Could you show the output of region_sets['DARs'] to confirm this?

On your question wether it is possible to run SCENIC+ with a couple of the partial result. This is possible, you can generate the menr dictionary like this (in your case):

import dill
CTX_topics_otsu_All = dill.load(open('results/motifs/CTX_topics_otsu_All.pkl', 'rb'))
DEM_topics_otsu_All = dill.load(open('results/motifs/DEM_topics_otsu_All.pkl', 'rb'))
CTX_topics_top_3_All = dill.load(open('results/motifs/CTX_topics_top_3_All.pkl', 'rb'))
DEM_topics_top_3_All = dill.load(open('results/motifs/DEM_topics_top_3_All.pkl', 'rb'))

menr['CTX_topics_otsu_All'] = CTX_topics_otsu_All
menr['DEM_topics_otsu_All'] = DEM_topics_otsu_All
menr['CTX_topics_top_3_All'] = CTX_topics_top_3_All
menr['DEM_topics_top_3_All'] = DEM_topics_top_3_All

Best,

Seppe

@sid5427
Copy link
Author

sid5427 commented Jan 11, 2023

Hi Seppe,

That's the weird part - when I run the code section for finding DARs in markers_dict

for DAR in markers_dict.keys():
    regions = markers_dict[DAR].index[markers_dict[DAR].index.str.startswith('chr')] #only keep regions on known chromosomes
    #print(regions)
    region_sets['DARs'][DAR] = pr.PyRanges(region_names_to_coordinates(regions))
    print("pr.PyRanges(region_names_to_coordinates(regions))")

I get this error -

pr.PyRanges(region_names_to_coordinates(regions))

---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
Cell In[14], line 4
      2 regions = markers_dict[DAR].index[markers_dict[DAR].index.str.startswith('chr')] #only keep regions on known chromosomes
      3 #print(regions)
----> 4 region_sets['DARs'][DAR] = pr.PyRanges(region_names_to_coordinates(regions))
      5 print("pr.PyRanges(region_names_to_coordinates(regions))")

File ~/testing_area/pycistarget/pycistarget/utils.py:33, in region_names_to_coordinates(region_names)
     31 regiondf=pd.concat([chrom, start, end], axis=1, sort=False)
     32 regiondf.index=[i for i in region_names if ':' in i]
---> 33 regiondf.columns=['Chromosome', 'Start', 'End']
     34 return(regiondf)

File ~/.conda/envs/py_3_8/lib/python3.8/site-packages/pandas/core/generic.py:5915, in NDFrame.__setattr__(self, name, value)
   5913 try:
   5914     object.__getattribute__(self, name)
-> 5915     return object.__setattr__(self, name, value)
   5916 except AttributeError:
   5917     pass

File ~/.conda/envs/py_3_8/lib/python3.8/site-packages/pandas/_libs/properties.pyx:69, in pandas._libs.properties.AxisProperty.__set__()

File ~/.conda/envs/py_3_8/lib/python3.8/site-packages/pandas/core/generic.py:823, in NDFrame._set_axis(self, axis, labels)
    821 def _set_axis(self, axis: int, labels: AnyArrayLike | list) -> None:
    822     labels = ensure_index(labels)
--> 823     self._mgr.set_axis(axis, labels)
    824     self._clear_item_cache()

File ~/.conda/envs/py_3_8/lib/python3.8/site-packages/pandas/core/internals/managers.py:227, in BaseBlockManager.set_axis(self, axis, new_labels)
    225 def set_axis(self, axis: int, new_labels: Index) -> None:
    226     # Caller is responsible for ensuring we have an Index object.
--> 227     self._validate_set_axis(axis, new_labels)
    228     self.axes[axis] = new_labels

File ~/.conda/envs/py_3_8/lib/python3.8/site-packages/pandas/core/internals/base.py:70, in DataManager._validate_set_axis(self, axis, new_labels)
     67     pass
     69 elif new_len != old_len:
---> 70     raise ValueError(
     71         f"Length mismatch: Expected axis has {old_len} elements, new "
     72         f"values have {new_len} elements"
     73     )

ValueError: Length mismatch: Expected axis has 0 elements, new values have 3 elements

However if I run region_sets['DARs'] after that - I get this -

{'BMCP': +--------------+-----------+-----------+
 | Chromosome   | Start     | End       |
 | (category)   | (int32)   | (int32)   |
 |--------------+-----------+-----------|
 | chr1         | 21353367  | 21353867  |
 | chr1         | 27542533  | 27543033  |
 | chr1         | 147377812 | 147378312 |
 | chr1         | 186195347 | 186195847 |
 | ...          | ...       | ...       |
 | chrX         | 130179564 | 130180064 |
 | chrX         | 129957183 | 129957683 |
 | chrX         | 109848273 | 109848773 |
 | chrX         | 41257752  | 41258252  |
 +--------------+-----------+-----------+
 Unstranded PyRanges object has 3,635 rows and 3 columns from 23 chromosomes.
 For printing, the PyRanges was sorted on Chromosome.}

I went ahead and printed the output of print(markers_dict) and this what I get - looks like scenic does not detect markers for certain cell types (i.e. the result from markers_dict = find_diff_features(cistopic_obj, imputed_acc_obj, variable='celltype', var_features=variable_regions, split_pattern = '-') <-- this complete successfully though...)

{'BMCP':                             Log2FC Adjusted_pval Contrast
chr8:73520503-73521003    4.247113           0.0     BMCP
chr1:21353367-21353867    4.242884           0.0     BMCP
chr11:44780868-44781368   4.219286           0.0     BMCP
chr13:44397846-44398346   4.167223           0.0     BMCP
chr1:27542533-27543033    4.164805           0.0     BMCP
...                            ...           ...      ...
chr22:38768855-38769355   0.586214           0.0     BMCP
chr7:15977629-15978129    0.585935           0.0     BMCP
chr5:88884545-88885045    0.585763           0.0     BMCP
chr5:150129881-150130381  0.585325           0.0     BMCP
chr3:195853922-195854422   0.58524           0.0     BMCP

[3636 rows x 3 columns], 'CD14-Mono': Empty DataFrame
Columns: [Log2FC, Adjusted_pval, Contrast]
Index: [], 'CD34 Gran-ATAC':                             Log2FC Adjusted_pval        Contrast
chr12:2294663-2295163     4.291807           0.0  CD34 Gran-ATAC
chrX:15911664-15912164     4.25781           0.0  CD34 Gran-ATAC
chr3:128523503-128524003  4.191673           0.0  CD34 Gran-ATAC
chr9:129058106-129058606  4.185681           0.0  CD34 Gran-ATAC
chr13:28468022-28468522   4.104304           0.0  CD34 Gran-ATAC
...                            ...           ...             ...
chr9:127967948-127968448  0.585811      0.000347  CD34 Gran-ATAC
chrX:56813483-56813983    0.585558      0.000001  CD34 Gran-ATAC
chr14:35286835-35287335   0.585539      0.000006  CD34 Gran-ATAC
chr11:72787723-72788223   0.585364      0.000001  CD34 Gran-ATAC
chr2:20424749-20425249     0.58513           0.0  CD34 Gran-ATAC

[3357 rows x 3 columns], 'CLP':                              Log2FC Adjusted_pval Contrast
chr2:231672428-231672928   2.806754      0.001634      CLP
chr16:29339242-29339742     2.58527       0.00002      CLP
chr7:2723132-2723632        2.58264       0.00002      CLP
chr22:22935612-22936112    2.513794       0.00002      CLP
chr12:110647657-110648157  2.494059       0.00002      CLP
...                             ...           ...      ...
chr19:47336760-47337260     0.58605       0.01005      CLP
chr2:101474036-101474536    0.58595      0.000616      CLP
chr11:104653394-104653894  0.585924      0.022442      CLP
chr12:89663417-89663917    0.585352      0.045297      CLP
chr6:150142812-150143312   0.585031      0.000074      CLP

[4583 rows x 3 columns], 'ERP':                             Log2FC Adjusted_pval Contrast
chr1:21353367-21353867    4.939022           0.0      ERP
chr8:73520503-73521003     4.93843           0.0      ERP
chr11:44780868-44781368    4.91324           0.0      ERP
chr13:44397846-44398346   4.868945           0.0      ERP
chr1:27542533-27543033    4.864803           0.0      ERP
...                            ...           ...      ...
chr1:179586214-179586714  0.586495           0.0      ERP
chr17:36482888-36483388   0.586426           0.0      ERP
chr12:79882960-79883460   0.586234      0.000066      ERP
chr21:34201754-34202254   0.586081           0.0      ERP
chr7:138400431-138400931  0.585121      0.000039      ERP

[3377 rows x 3 columns], 'HSC CACNB2':                             Log2FC Adjusted_pval    Contrast
chr17:13347389-13347889   0.914718           0.0  HSC CACNB2
chr3:152977353-152977853  0.900724           0.0  HSC CACNB2
chrX:72269788-72270288    0.897633           0.0  HSC CACNB2
chr5:119274871-119275371  0.897493           0.0  HSC CACNB2
chr4:155543343-155543843  0.897219           0.0  HSC CACNB2
...                            ...           ...         ...
chr15:34855325-34855825   0.585576           0.0  HSC CACNB2
chr15:52483673-52484173   0.585511           0.0  HSC CACNB2
chr10:71776295-71776795   0.585307           0.0  HSC CACNB2
chr2:15640184-15640684    0.585199           0.0  HSC CACNB2
chr21:45364168-45364668   0.585058           0.0  HSC CACNB2

[658 rows x 3 columns], 'HSC HIST1H2AC':                             Log2FC Adjusted_pval       Contrast
chrX:13470339-13470839    1.188423           0.0  HSC HIST1H2AC
chr2:108288396-108288896   1.17873           0.0  HSC HIST1H2AC
chrX:98398402-98398902    1.174786           0.0  HSC HIST1H2AC
chr5:45507129-45507629    1.171706           0.0  HSC HIST1H2AC
chr6:170506315-170506815  1.168462           0.0  HSC HIST1H2AC
...                            ...           ...            ...
chr8:28347974-28348474    0.585566           0.0  HSC HIST1H2AC
chr12:60144359-60144859   0.585439           0.0  HSC HIST1H2AC
chr8:144292302-144292802  0.585366           0.0  HSC HIST1H2AC
chr17:61252140-61252640    0.58524           0.0  HSC HIST1H2AC
chr14:92586155-92586655   0.585017           0.0  HSC HIST1H2AC

[2866 rows x 3 columns], 'HSC MYADM-CD97':                             Log2FC Adjusted_pval        Contrast
chr17:13347389-13347889   1.627802           0.0  HSC MYADM-CD97
chrX:72269788-72270288    1.593157           0.0  HSC MYADM-CD97
chr1:52139793-52140293    1.588745           0.0  HSC MYADM-CD97
chr1:207282009-207282509  1.573321           0.0  HSC MYADM-CD97
chr12:81668404-81668904   1.564987           0.0  HSC MYADM-CD97
...                            ...           ...             ...
chr9:121351196-121351696  0.585478           0.0  HSC MYADM-CD97
chr18:70781827-70782327   0.585219           0.0  HSC MYADM-CD97
chr2:20727942-20728442    0.585111           0.0  HSC MYADM-CD97
chr12:52259754-52260254   0.585101           0.0  HSC MYADM-CD97
chr17:63676658-63677158   0.585022           0.0  HSC MYADM-CD97

[2861 rows x 3 columns], 'HSC WNT11':                             Log2FC Adjusted_pval   Contrast
chr20:34583552-34584052   0.897686           0.0  HSC WNT11
chr1:157465990-157466490  0.857838           0.0  HSC WNT11
chr17:69703839-69704339   0.854879           0.0  HSC WNT11
chr16:86415293-86415793   0.836379           0.0  HSC WNT11
chr9:99833712-99834212    0.836237           0.0  HSC WNT11
...                            ...           ...        ...
chr5:45507129-45507629    0.585922           0.0  HSC WNT11
chr18:11652338-11652838   0.585842           0.0  HSC WNT11
chr2:46866827-46867327    0.585681           0.0  HSC WNT11
chr2:219182404-219182904  0.585369           0.0  HSC WNT11
chrX:112889319-112889819  0.585211           0.0  HSC WNT11

[232 rows x 3 columns], 'LMPP CDK6-FLT3':                              Log2FC Adjusted_pval        Contrast
chr17:60402734-60403234    1.689114           0.0  LMPP CDK6-FLT3
chr5:157867743-157868243   1.673037           0.0  LMPP CDK6-FLT3
chr6:119523493-119523993   1.670789           0.0  LMPP CDK6-FLT3
chr3:139202264-139202764   1.666878           0.0  LMPP CDK6-FLT3
chr5:98922016-98922516     1.653112           0.0  LMPP CDK6-FLT3
...                             ...           ...             ...
chr3:46926890-46927390     0.585351           0.0  LMPP CDK6-FLT3
chr19:41378266-41378766    0.585168           0.0  LMPP CDK6-FLT3
chr8:101314062-101314562    0.58502           0.0  LMPP CDK6-FLT3
chr22:29079289-29079789    0.584999           0.0  LMPP CDK6-FLT3
chr11:123484430-123484930  0.584964           0.0  LMPP CDK6-FLT3

[4573 rows x 3 columns], 'LMPP LSAMP':                             Log2FC Adjusted_pval    Contrast
chr19:28388949-28389449   2.962269           0.0  LMPP LSAMP
chr2:108776418-108776918  2.960341           0.0  LMPP LSAMP
chr5:158825148-158825648  2.957966           0.0  LMPP LSAMP
chr10:33715201-33715701   2.956146           0.0  LMPP LSAMP
chr3:29030843-29031343    2.953672           0.0  LMPP LSAMP
...                            ...           ...         ...
chr13:41916484-41916984   0.585829           0.0  LMPP LSAMP
chr3:122271735-122272235  0.585776           0.0  LMPP LSAMP
chr20:19943058-19943558   0.585691           0.0  LMPP LSAMP
chr11:44611700-44612200   0.585415           0.0  LMPP LSAMP
chr15:38817713-38818213    0.58509           0.0  LMPP LSAMP

[5472 rows x 3 columns], 'LMPP Naive T-cell':                             Log2FC Adjusted_pval           Contrast
chr2:231672428-231672928  5.046681           0.0  LMPP Naive T-cell
chr17:57552218-57552718   4.440933           0.0  LMPP Naive T-cell
chr2:234164455-234164955  4.180235           0.0  LMPP Naive T-cell
chr22:44025612-44026112   4.119872           0.0  LMPP Naive T-cell
chr11:65639492-65639992    4.04134           0.0  LMPP Naive T-cell
...                            ...           ...                ...
chr6:24666804-24667304    0.586347           0.0  LMPP Naive T-cell
chr22:48098172-48098672   0.586282           0.0  LMPP Naive T-cell
chr7:111408651-111409151  0.585472      0.000576  LMPP Naive T-cell
chr9:99129969-99130469    0.585327      0.000062  LMPP Naive T-cell
chr2:127829811-127830311  0.584968      0.000556  LMPP Naive T-cell

[1764 rows x 3 columns], 'LMPP PRSS1':                             Log2FC Adjusted_pval    Contrast
chr10:33715201-33715701   2.414107           0.0  LMPP PRSS1
chr3:29030843-29031343    2.412636           0.0  LMPP PRSS1
chr21:38525418-38525918   2.412636           0.0  LMPP PRSS1
chr19:28388949-28389449   2.411385           0.0  LMPP PRSS1
chr20:53686072-53686572   2.410341           0.0  LMPP PRSS1
...                            ...           ...         ...
chr1:212596123-212596623  0.585835           0.0  LMPP PRSS1
chr19:19451403-19451903   0.585529           0.0  LMPP PRSS1
chr1:43836421-43836921    0.585337           0.0  LMPP PRSS1
chr6:31351814-31352314    0.585109           0.0  LMPP PRSS1
chr3:69092014-69092514    0.585054           0.0  LMPP PRSS1

[5788 rows x 3 columns], 'LT-HSC HLF':                              Log2FC Adjusted_pval    Contrast
chr5:45507129-45507629     1.531193           0.0  LT-HSC HLF
chr6:170506315-170506815   1.528928           0.0  LT-HSC HLF
chr10:13133034-13133534    1.509826           0.0  LT-HSC HLF
chr22:37613498-37613998    1.505232           0.0  LT-HSC HLF
chr12:103540329-103540829  1.502772           0.0  LT-HSC HLF
...                             ...           ...         ...
chr14:24313086-24313586    0.585542           0.0  LT-HSC HLF
chr20:18589905-18590405    0.585467           0.0  LT-HSC HLF
chr17:75864253-75864753    0.585419           0.0  LT-HSC HLF
chr12:66776806-66777306    0.585236           0.0  LT-HSC HLF
chr8:109745381-109745881   0.585025           0.0  LT-HSC HLF

[3537 rows x 3 columns], 'MDP-2 GPR133':                              Log2FC Adjusted_pval      Contrast
chr2:231672428-231672928   2.663041       0.00004  MDP-2 GPR133
chr16:29339242-29339742    2.645864      0.000032  MDP-2 GPR133
chr7:2723132-2723632       2.619371      0.000032  MDP-2 GPR133
chr12:110647657-110648157  2.560181      0.000032  MDP-2 GPR133
chr22:22935612-22936112    2.539039      0.000032  MDP-2 GPR133
...                             ...           ...           ...
chr19:3849137-3849637      0.585687      0.023213  MDP-2 GPR133
chr10:74058584-74059084    0.585513      0.000168  MDP-2 GPR133
chr3:184492115-184492615   0.585447      0.001824  MDP-2 GPR133
chrX:114268580-114269080   0.585185      0.002259  MDP-2 GPR133
chr7:139811046-139811546   0.585112      0.000575  MDP-2 GPR133

[4048 rows x 3 columns], 'MDP-pDC':                              Log2FC Adjusted_pval Contrast
chr2:231672428-231672928   5.797586           0.0  MDP-pDC
chr17:57552218-57552718    5.176988           0.0  MDP-pDC
chr2:234164455-234164955    4.96483           0.0  MDP-pDC
chr22:44025612-44026112    4.882341           0.0  MDP-pDC
chr11:65639492-65639992    4.872473           0.0  MDP-pDC
...                             ...           ...      ...
chr19:3324341-3324841      0.585875      0.000001  MDP-pDC
chr8:38830617-38831117      0.58583      0.000004  MDP-pDC
chr13:98484715-98485215    0.585555      0.006703  MDP-pDC
chr12:120437801-120438301  0.585127      0.000002  MDP-pDC
chr12:132558281-132558781  0.585089      0.000273  MDP-pDC

[4951 rows x 3 columns], 'MEP-MKP':                            Log2FC Adjusted_pval Contrast
chr8:73520503-73521003    3.73497           0.0  MEP-MKP
chr1:21353367-21353867    3.70901           0.0  MEP-MKP
chr11:44780868-44781368   3.69003           0.0  MEP-MKP
chr1:27542533-27543033   3.619486           0.0  MEP-MKP
chr13:44397846-44398346  3.613465           0.0  MEP-MKP
...                           ...           ...      ...
chr1:31162607-31163107   0.586068           0.0  MEP-MKP
chr7:94395369-94395869   0.585895           0.0  MEP-MKP
chr4:74134146-74134646   0.585226           0.0  MEP-MKP
chr17:29117081-29117581  0.585171           0.0  MEP-MKP
chr19:41367776-41368276  0.584975           0.0  MEP-MKP

[3786 rows x 3 columns], 'ML-Gran':                             Log2FC Adjusted_pval Contrast
chr2:239814585-239815085  1.545085           0.0  ML-Gran
chr7:2214382-2214882       1.53738           0.0  ML-Gran
chr10:11901863-11902363    1.43475           0.0  ML-Gran
chr6:5162341-5162841      1.428097           0.0  ML-Gran
chr4:6888842-6889342      1.397363           0.0  ML-Gran
...                            ...           ...      ...
chr22:35602121-35602621   0.585847           0.0  ML-Gran
chr11:93718133-93718633    0.58579           0.0  ML-Gran
chr16:84913249-84913749   0.585337           0.0  ML-Gran
chr11:1152253-1152753     0.585231           0.0  ML-Gran
chr2:88858054-88858554    0.585031           0.0  ML-Gran

[1035 rows x 3 columns], 'MPP Ribo-high': Empty DataFrame
Columns: [Log2FC, Adjusted_pval, Contrast]
Index: [], 'MPP SPINK2-CD99': Empty DataFrame
Columns: [Log2FC, Adjusted_pval, Contrast]
Index: [], 'MultiLin-ATAC':                             Log2FC Adjusted_pval       Contrast
chr12:2294663-2295163     1.908098           0.0  MultiLin-ATAC
chrX:15911664-15912164    1.876974           0.0  MultiLin-ATAC
chr3:128523503-128524003  1.864965           0.0  MultiLin-ATAC
chr9:129058106-129058606  1.818677           0.0  MultiLin-ATAC
chr14:59361084-59361584   1.807945           0.0  MultiLin-ATAC
...                            ...           ...            ...
chr8:100577336-100577836  0.585536           0.0  MultiLin-ATAC
chr17:82334744-82335244   0.585499           0.0  MultiLin-ATAC
chr10:30049091-30049591   0.585449           0.0  MultiLin-ATAC
chr15:89820641-89821141    0.58526           0.0  MultiLin-ATAC
chr16:68284770-68285270   0.585148           0.0  MultiLin-ATAC

[2195 rows x 3 columns], 'ST-HSC PBX1':                             Log2FC Adjusted_pval     Contrast
chr16:59048819-59049319    0.69066           0.0  ST-HSC PBX1
chr1:100628676-100629176  0.689165           0.0  ST-HSC PBX1
chr2:195047452-195047952  0.686393           0.0  ST-HSC PBX1
chr9:3825541-3826041      0.686267           0.0  ST-HSC PBX1
chr15:35595622-35596122   0.685384           0.0  ST-HSC PBX1
...                            ...           ...          ...
chr4:21041593-21042093    0.585442           0.0  ST-HSC PBX1
chr1:209925957-209926457  0.585398           0.0  ST-HSC PBX1
chr1:169465253-169465753  0.585347           0.0  ST-HSC PBX1
chr13:98454535-98455035   0.585183           0.0  ST-HSC PBX1
chr18:36168255-36168755   0.585096           0.0  ST-HSC PBX1

[1022 rows x 3 columns], 'pre-Gran CP':                             Log2FC Adjusted_pval     Contrast
chr12:2294663-2295163      3.18217           0.0  pre-Gran CP
chrX:15911664-15912164    3.137117           0.0  pre-Gran CP
chr3:128523503-128524003   3.08506           0.0  pre-Gran CP
chr9:129058106-129058606  3.076241           0.0  pre-Gran CP
chr13:28468022-28468522   3.014072           0.0  pre-Gran CP
...                            ...           ...          ...
chr6:117547619-117548119  0.585609           0.0  pre-Gran CP
chr4:146243417-146243917  0.585519           0.0  pre-Gran CP
chr3:50626386-50626886    0.585345           0.0  pre-Gran CP
chr18:73768053-73768553   0.585049           0.0  pre-Gran CP
chr6:10603219-10603719    0.584966           0.0  pre-Gran CP

[3673 rows x 3 columns], 'pre-MEP':                             Log2FC Adjusted_pval Contrast
chr10:71980251-71980751   1.818789           0.0  pre-MEP
chr10:12328693-12329193   1.813141           0.0  pre-MEP
chr3:189890471-189890971  1.809711           0.0  pre-MEP
chr14:29650519-29651019   1.807064           0.0  pre-MEP
chr9:591201-591701        1.798752           0.0  pre-MEP
...                            ...           ...      ...
chr8:84626097-84626597     0.58563           0.0  pre-MEP
chr2:126368715-126369215   0.58541           0.0  pre-MEP
chr6:87723209-87723709    0.585265           0.0  pre-MEP
chr16:19130706-19131206   0.585088           0.0  pre-MEP
chr11:32056360-32056860   0.585063           0.0  pre-MEP

[3227 rows x 3 columns], 'pre-PC':                             Log2FC Adjusted_pval Contrast
chr2:231672428-231672928  6.236564           0.0   pre-PC
chr17:57552218-57552718   5.575458           0.0   pre-PC
chr2:234164455-234164955  5.337587           0.0   pre-PC
chr22:44025612-44026112   5.195669           0.0   pre-PC
chr11:65639492-65639992   5.120273           0.0   pre-PC
...                            ...           ...      ...
chr12:11650900-11651400   0.588236      0.000293   pre-PC
chr13:30114859-30115359   0.586593           0.0   pre-PC
chr9:129459108-129459608  0.586559           0.0   pre-PC
chr1:92485186-92485686    0.586446           0.0   pre-PC
chr19:41530845-41531345   0.586325           0.0   pre-PC

[2235 rows x 3 columns]}

@SeppeDeWinter
Copy link
Collaborator

Hi @sid5427

Yes indeed, it's these empty dataframes in markers_dict that is causing the error (i.e. 'CD14-Mono', 'MPP Ribo-high' and 'MPP SPINK2-CD99').

You should remove this prior to running:

for DAR in markers_dict.keys():
    regions = markers_dict[DAR].index[markers_dict[DAR].index.str.startswith('chr')] #only keep regions on known chromosomes
    #print(regions)
    region_sets['DARs'][DAR] = pr.PyRanges(region_names_to_coordinates(regions))

You can also do it like this

for DAR in markers_dict.keys():
    regions = markers_dict[DAR].index[markers_dict[DAR].index.str.startswith('chr')] #only keep regions on known chromosomes
    if len(regions) > 0:
        region_sets['DARs'][DAR] = pr.PyRanges(region_names_to_coordinates(regions))

The reason that these dataframes are empty is because no regions passed the thresholds (i.e. log 2 Fold Change of 1.5 and adjusted p value < 0.05, by default). You can also change these thresholds in find_diff_features function to get more regions.

Best,

Seppe

@sid5427
Copy link
Author

sid5427 commented Jan 24, 2023

Hi Seppe,

Thanks for the solution - I'll incorporate that into my run. I had tried this to remove the three troublesome clusters -

##remove clusters CD14-Mono, MPP Ribo-high, MPP SPINK2-CD99
adata_filtered = adata[adata.obs['cell_type'] != 'MPP Ribo-high' ] #MPP Ribo-high
adata_filtered = adata_filtered[adata_filtered.obs['cell_type'] != 'CD14-Mono' ] #CD14-Mono
adata_filtered = adata_filtered[adata_filtered.obs['cell_type'] != 'MPP SPINK2-CD99' ] #MPP SPINK2-CD99
adata_filtered.obs.cell_type
adata = adata_filtered ##replace original adata with filtered one
del(adata_filtered)

This did work, and it generated a scenicplus object with some of the downstream figures. However I get an error later for this part -

from scenicplus.cistromes import TF_cistrome_correlation, generate_pseudobulks

generate_pseudobulks(
        scplus_obj = scplus_obj,
        variable = 'GEX_cell_type',
        auc_key = 'eRegulon_AUC_filtered',
        signature_key = 'Gene_based')
generate_pseudobulks(
        scplus_obj = scplus_obj,
        variable = 'GEX_cell_type',
        auc_key = 'eRegulon_AUC_filtered',
        signature_key = 'Region_based')

TF_cistrome_correlation(
            scplus_obj,
            use_pseudobulk = True,
            variable = 'GEX_cell_type',
            auc_key = 'eRegulon_AUC_filtered',
            signature_key = 'Gene_based',
            out_key = 'filtered_gene_based')
TF_cistrome_correlation(
            scplus_obj,
            use_pseudobulk = True,
            variable = 'GEX_cell_type',
            auc_key = 'eRegulon_AUC_filtered',
            signature_key = 'Region_based',
            out_key = 'filtered_region_based')

and this is the error -

---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
Cell In[19], line 3
      1 from scenicplus.cistromes import TF_cistrome_correlation, generate_pseudobulks
----> 3 generate_pseudobulks(
      4         scplus_obj = scplus_obj,
      5         variable = 'GEX_cell_type',
      6         auc_key = 'eRegulon_AUC_filtered',
      7         signature_key = 'Gene_based')
      8 generate_pseudobulks(
      9         scplus_obj = scplus_obj,
     10         variable = 'GEX_cell_type',
     11         auc_key = 'eRegulon_AUC_filtered',
     12         signature_key = 'Region_based')
     14 TF_cistrome_correlation(
     15             scplus_obj,
     16             use_pseudobulk = True,
   (...)
     19             signature_key = 'Gene_based',
     20             out_key = 'filtered_gene_based')

File ~/testing_area/scenicplus/src/scenicplus/cistromes.py:227, in generate_pseudobulks(scplus_obj, variable, normalize_expression, auc_key, signature_key, nr_cells, nr_pseudobulks, seed)
    225 for x in range(nr_pseudobulks):
    226     random.seed(x)
--> 227     sample_cells = sample(cells, nr_cells)
    228     sub_dgem = dgem.loc[sample_cells, :].mean(axis=0)
    229     sub_auc = cistromes_auc.loc[sample_cells, :].mean(axis=0)

File ~/.conda/envs/py_3_8/lib/python3.8/random.py:363, in Random.sample(self, population, k)
    361 n = len(population)
    362 if not 0 <= k <= n:
--> 363     raise ValueError("Sample larger than population or is negative")
    364 result = [None] * k
    365 setsize = 21        # size of a small set minus size of an empty list

ValueError: Sample larger than population or is negative

Is this related to my ad-hoc solution? Will using the code snippet you provided solve this error downstream?

Appreciate the help!
Sid

@SeppeDeWinter
Copy link
Collaborator

Hi @sid5427

This is a known "bug" that is caused by the fact that you have an annotation (GEX_celltype) with less than 5 cells.

However the fact that you're at this step means that SCENIC+ has indeed worked successfully. You can skip this optional step for now by setting calculate_TF_eGRN_correlation to False. I will fix this bug a soon as I have some time.

Best,

Seppe

@SeppeDeWinter
Copy link
Collaborator

Hi,

you can use this 6b4bdad function instead. It does not require generating pseudobulks beforehand.

Best,

Seppe

@RosaDeSa
Copy link

Same problem. I don't have menr.pkl and DEM_*_topics.pkl after running run_pycistarget.
I have only CTX files.
What could be the problem @SeppeDeWinter ?

@SeppeDeWinter
Copy link
Collaborator

@RosaDeSa

Did you have any error messages after running run_pycistarget?
If not, you can try running using a single core, this might reveal some error message that was not passed properly.

Best,

Seppe

@RosaDeSa
Copy link

Thanks @SeppeDeWinter using a single core, it worked!

@SeppeDeWinter
Copy link
Collaborator

You did not see any error messages using a single core?

Best,

Seppe

@RosaDeSa
Copy link

Oddly, it worked without errors and gave me in output of all the files using a single core.
Best,
Rosa

@CYorick
Copy link

CYorick commented Nov 1, 2023

Hi @sid5427

From the error I suspect that region_sets['DARs'] might be empty or contain empty entries. Could you show the output of region_sets['DARs'] to confirm this?

On your question wether it is possible to run SCENIC+ with a couple of the partial result. This is possible, you can generate the menr dictionary like this (in your case):

import dill
CTX_topics_otsu_All = dill.load(open('results/motifs/CTX_topics_otsu_All.pkl', 'rb'))
DEM_topics_otsu_All = dill.load(open('results/motifs/DEM_topics_otsu_All.pkl', 'rb'))
CTX_topics_top_3_All = dill.load(open('results/motifs/CTX_topics_top_3_All.pkl', 'rb'))
DEM_topics_top_3_All = dill.load(open('results/motifs/DEM_topics_top_3_All.pkl', 'rb'))

menr['CTX_topics_otsu_All'] = CTX_topics_otsu_All
menr['DEM_topics_otsu_All'] = DEM_topics_otsu_All
menr['CTX_topics_top_3_All'] = CTX_topics_top_3_All
menr['DEM_topics_top_3_All'] = DEM_topics_top_3_All

Best,

Seppe

Similar problem, my markers_dict is empty, which may be the cause of the death of core while running run_pycistarget. And it did not create CTX_topics_otsu_All.pkl as well as other pkl files. Instead, I only have CTX_topics_otsu_All files, should I combine all the html files and turn into a pkl and then run the above code?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
question Further information is requested
Projects
None yet
Development

No branches or pull requests

4 participants