Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Parallelization error when optimizing parameters for NicheNet #292

Open
tkapello opened this issue Aug 6, 2024 · 4 comments
Open

Parallelization error when optimizing parameters for NicheNet #292

tkapello opened this issue Aug 6, 2024 · 4 comments

Comments

@tkapello
Copy link

tkapello commented Aug 6, 2024

Hi,

I am implementing the Omnipath resources for NicheNet analysis on a Windows computer. I was optimizing the parameters for the model as shown below:

# Parameter optimization
expression_settings_validation<-readRDS(url("https://zenodo.org/record/3260758/files/expression_settings.rds"))

my_source_weights_df<-tibble(source=unique(c(lr_network$source, sig_network$source, gr_network$source)), weight=rep(1, length(unique(c(lr_network$source, sig_network$source, gr_network$source)))))

additional_arguments_topology_correction<-list(source_names=my_source_weights_df$source %>% unique(), 
                                               algorithm="PPR", 
                                               correct_topology=FALSE,
                                               lr_network=lr_network, 
                                               sig_network=sig_network, 
                                               gr_network=gr_network, 
                                               settings=lapply(expression_settings_validation, convert_expression_settings_evaluation),
                                               secondary_targets=FALSE, 
                                               remove_direct_links="no", 
                                               cutoff_method="quantile")

nr_datasources<-additional_arguments_topology_correction$source_names %>% length()

obj_fun_multi_topology_correction<-makeMultiObjectiveFunction(name="nichenet_optimization",
                                                              description="data source weight and hyperparameter optimization: expensive black-box function",
                                                              fn=model_evaluation_optimization, 
                                                              par.set=makeParamSet(makeNumericVectorParam("source_weights", len=nr_datasources, lower=0, upper=1, tunable=FALSE),
                                                                                   makeNumericVectorParam("lr_sig_hub", len=1, lower=0, upper=1, tunable=TRUE),  
                                                                                   makeNumericVectorParam("gr_hub", len=1, lower=0, upper=1, tunable=TRUE),  
                                                                                   makeNumericVectorParam("ltf_cutoff", len=1, lower=0.9, upper=0.999, tunable=TRUE), 
                                                                                   makeNumericVectorParam("damping_factor", len=1, lower=0.01, upper=0.99, tunable=TRUE)), 
                                                              has.simple.signature=FALSE,
                                                              n.objectives=4, 
                                                              noisy=FALSE,
                                                              minimize=c(FALSE, FALSE, FALSE, FALSE))

optimization_results=lapply(1, mlrmbo_optimization, obj_fun=obj_fun_multi_topology_correction, niter=8, ncores=5, nstart=1250, additional_arguments=additional_arguments_topology_correction)

However, I have an error:

Error in parallelStart(mode = MODE_MULTICORE, cpus = cpus, level = level,  : 
  Multicore mode not supported on windows!

I understand that the issue is the Windows OS which can not use several cores to process functions but I have not found a workaround with parallelStart(mode='socket', cpus=5) with no success. I would appreciate any support!

@csangara
Copy link
Member

Hi,

From NicheNet v2 onwards we started using nsga2R optimization instead of mlrMBO, as it is much faster. Unfortunately this means I am not able to help you with this particular issue, as the code in question is 5+ years old at this point, and it seems the parallelMap package we used for parallelization has been deprecated for 4 years now.

I would recommend you to switch to the nsga2r functions we have...but the optimization would still take multiple days to run, so I'm not sure it will be feasible on a personal computer.

Thanks for trying out the optimization though, it's the first issue we've gotten about this 😄 It's probably going to be very tricky to make it run on a different system, so let me know if you run into more problems.

Best regards,
Chananchida

@tkapello
Copy link
Author

tkapello commented Aug 27, 2024

Hi @csangara,

just a clarification. I was going to run the optimization steps on a desktop computer with 64 GB RAM. In the tutorial, I see that you used a HPC which ran for a few days. Unfortunately, I do not have that possibility. So I was wondering how critical this step really is (even using nsga2) for running NicheNet with an updated set of ligand-receptor pairs from the Omnipath database. In other words, could I use the merged ligand-target pairs from Omnipath & NicheNet without the optimization steps?

Thanks in advance,
Theo

@csangara
Copy link
Member

Hi Theo,

The optimization step can slightly improve the performance of NicheNet target gene prediction but I would not say it is very critical (see Supplementary Notes Figure 2.3 and 2.4 in the MultiNicheNet paper). So you can definitely use updated LR pairs from Omnipath without optimization. However, when reconstructing the ligand-target matrix, I would recommend taking a similar weight as the original Omnipath LR data source (named "omnipath" with score of ~0.16, see also Supplementary Table 1B,E).

@tkapello
Copy link
Author

tkapello commented Sep 3, 2024

Thank you @csangara,

last question! I understand the purpose of giving weights to different databases, e.g. Omnipath. When I combine NicheNet and Omnipath, I have > 1,000 sources, some of which are in the Table you referred me to. However, I was thinking that if I give special weights to only those sources (about 60 in number), I might be biasing my downstream predictions. What do you think? Would you still recommend weighting a few databases or keep the same default weight (i.e. 1) in all of them?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants