Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

WIP/RFC: parallel population optimizer #25

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

alyst
Copy link
Contributor

@alyst alyst commented Jul 20, 2015

This is an attempt to implement some parallelization. Of course, the most straightforward way would be to parallelize fitness calculation in rank_by_fitness!(), but since Julia doesn't have threads [yet], it would mean lots of data sending/synchronization overhead. So the goal was to have something that require less messaging and synchronization points.

ParallelPopulationOptimizer basically starts N independent population optimizers on worker processes. After migrationPeriod steps each worker randomly selects migrationSize individuals and sends them to the master process. Master process just constantly listens to the incoming immigrants and distributes them among the other (N-1) workers (+ collects the best ones). The acceptance of the immigrants by the workers happens through the standard tell!() interface: an immigrant replaces some random "aborigine" if it is more fit.

Parallelization is implemented using native Julian RemoteRef, Channel and Task mechanisms. Channels were added only in Julia v0.4, so it would not work on v0.3.

It's definitely not for production use yet, because there's no handling of exceptions in the worker processes.
Also, I don't know how efficient is this scheme in terms of convergence, and whether there's a way to improve it. At least in my experiments I can run 8 workers in parallel, and immigrants acceptance rate is around 20%.

@robertfeldt
Copy link
Owner

Interesting. This is a classic Master/Slave optimization scheme then and might be useful and should be able to speed up convergence (vs time). However, presently it fails the travis CI build (0.3 issue?) so not sure how we handle this right now.

@alyst
Copy link
Contributor Author

alyst commented Jul 21, 2015

Indeed it does speed up the convergence (interestingly, contrary to what R's DEoptim recommends (population size = 10 x number of parameters), for my problem it works best with small (50) population on each worker).

I guess it will never run on 0.3, because it misses the key functionality. Since the PR uses MessageUtils master, which has no tagged version, it is not straightforward to run on 0.4 either.
When/if the new parallel building blocks would land to Julia master, I will update the PR. It could be disabled on 0.3 to make the tests pass.

So it's not ready to merge, but I though it's worth publishing as PR to provide some parallelization starting point for those that might be interested.

@matthieugomez
Copy link
Contributor

Parallelization is great.

The functions I optimize generally use a lot of temporary memory, so I generally add a set preallocated arrays as an argument. When optimizing the function, I preallocate arrays, and then feed bboptimize with a closure

pa = PreallocateArray()
bboptimize(x -> f(x, pa))

I'm not sure how closures work with parallel workers. One solution would be to recreate arrays at every iterations

bboptimize(x -> f(x, PreallocateArray()))

but maybe there's a better way. I'd be great if this parallel solution handled these cases

@alyst
Copy link
Contributor Author

alyst commented Jul 21, 2015

@matthieugomez I think we should be able to safely combine preallocation and parallelization -- we just need to be sure that each worker receives its own copy of "functional object" f.
Actually, I'm already using similar approach that combines well with parallelization. For my problem I define a BBF <: OptimizationProblem type. It requires a few trivial methods: fitness_scheme(::BBF), name(::BBF), numobjectives(::BBF), numdims(::BBF), search_space(::BBF), and, most importantly, fitness(x::Vector{Float64}, f::BBF) function, i.e. my f(x).
In fitness(f, x) I can take advantage of whatever data or temporary arrays that are stored in BBF. The important point is that temporary array allocation happens on-demand, i.e. freshly created f::BBF object does not have any temporary arrays, so its copies could be safely distributed among the workers, and then each worker would have its own temporary memory automatically allocated when f(x) is first called.

In v0.4 it could be even simpler. You can define Base.call(f::F, x) method and then you can do f(x) without any need to create closures, which currently have poor performance in Julia. The only problem is that f would not be of type Function and e.g. Optim methods do not accept such arguments, but that should be easy to fix.

@matthieugomez
Copy link
Contributor

Great. Thanks for the explanation.

@robertfeldt
Copy link
Owner

Suggestion: Allow multiple different WorkerMethods. That might increase robustness on some problems (since a single optimizer is never best for all problems) and might also help convergence (worse optimizers might get out of local optima by "inspiration" from a better optimizer etc). Would be very nice if one can have also non-population optimizers in there since some of them are very much quicker on "simple" problems and it is well known that DE benefits from early "help" with a few good solutions in the population.

@alyst
Copy link
Contributor Author

alyst commented Jul 21, 2015

Definitely that would be very nice to have. But that requires more advanced communication to solve the load balancing problems, because the current scheme assumes migration rates are the same for all workers. Ultimately, it would be nice to have something like a parallelized version of Amalgam.

@robertfeldt
Copy link
Owner

Ok, yes.

Amalgam is very nice but parallelism on a lower level then. We should experiment with a eval parallel "block" size that we do pmap on at some point. If we are only sending numdims floats over in a vector I guess it should be ok for most but the very simple fitness functions.

@steven-varga
Copy link

Hi

I am wondering if an MPI implementation of the above idea is welcome? if so I tried to find 'ParallelPopulationOptimizer' code but somehow I failed when forked Alexey Stukalov rep;

how can I get a copy/clone of the relevant files?
best,
steve

@alyst
Copy link
Contributor Author

alyst commented Jul 24, 2015

@steven-varga Hi, if you want to distribute the calculations over several machines, MPI could be a way to go (you can also consider ZeroMQ or use non-local cluster managers -- the latter should require minimal modifications to the current code). For one machine I think the current approach is better, because it uses native Julia task scheduling and data messaging.

If you want to test this pull request in Julia, you can try

Pkg.add("MessageUtils")
Pkg.checkout("MessageUtils") # parallel optimizer requires the master of MessageUtils
Pkg.clone("https://github.com/alyst/BlackBoxOptim.jl.git")
Pkg.checkout("BlackBoxOptim", branch="parallel_pop_opt")

But be warned that the code is still experimental. Also I've rebased it on top of new BlackBoxOptim API, but haven't tested yet. Pls let me know if you have further questions.

@alyst
Copy link
Contributor Author

alyst commented Aug 6, 2015

I've updated the code to use Channels very recently introduced into Julia 0.4-dev, so MessageUtils are no longer required. However, the "smoketest" using parallel optimizer stalls for some yet-to-be-investigated reason.

@alyst alyst force-pushed the parallel_pop_opt branch 2 times, most recently from 520bebd to 20a2a69 Compare August 10, 2015 23:36
@alyst
Copy link
Contributor Author

alyst commented Aug 10, 2015

Updated to use recently introduced type-parameterized RemoteRefs and parallel exception handling.
The tests now pass on 0.4-nightly (the current CI failure is due to XNES smoketest randomly failed).
It starts to be more usable as now there should be no stalls due to worker threads silently crashing.

@alyst alyst force-pushed the parallel_pop_opt branch 3 times, most recently from 9f09ea9 to 8310746 Compare August 19, 2015 11:41
@robertfeldt
Copy link
Owner

@alyst This can now be closed, right, since it was superseeded by the ParallelEvaluator code?

@alyst
Copy link
Contributor Author

alyst commented Nov 28, 2015

@robertfeldt ParallelEvaluator covers methods that generate multiple candidates at once, e.g. NES. This master/slave optimizer [better] fits DE-like algorithms. So both can co-exist.
I had not rebased the PR recently (now that ParallelEvaluator is in, I would do it shortly), but it's in a working state, I'm using it in combination with DE for my problems.

I would leave it as PR. At least it gives a starting point for parallelization of multiple different optimizers.

@robertfeldt
Copy link
Owner

Ok, yes, I remember now. Ok lets leave this as PR for now. Great if you can take a look at the example of parallel eval so we can help guide people on its use. I guess we should use a NES alg in it, then.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants