WIP/RFC: parallel population optimizer #25

alyst · 2015-07-20T16:05:04Z

This is an attempt to implement some parallelization. Of course, the most straightforward way would be to parallelize fitness calculation in rank_by_fitness!(), but since Julia doesn't have threads [yet], it would mean lots of data sending/synchronization overhead. So the goal was to have something that require less messaging and synchronization points.

ParallelPopulationOptimizer basically starts N independent population optimizers on worker processes. After migrationPeriod steps each worker randomly selects migrationSize individuals and sends them to the master process. Master process just constantly listens to the incoming immigrants and distributes them among the other (N-1) workers (+ collects the best ones). The acceptance of the immigrants by the workers happens through the standard tell!() interface: an immigrant replaces some random "aborigine" if it is more fit.

Parallelization is implemented using native Julian RemoteRef, Channel and Task mechanisms. Channels were added only in Julia v0.4, so it would not work on v0.3.

It's definitely not for production use yet, because there's no handling of exceptions in the worker processes.
Also, I don't know how efficient is this scheme in terms of convergence, and whether there's a way to improve it. At least in my experiments I can run 8 workers in parallel, and immigrants acceptance rate is around 20%.

robertfeldt · 2015-07-21T05:08:29Z

Interesting. This is a classic Master/Slave optimization scheme then and might be useful and should be able to speed up convergence (vs time). However, presently it fails the travis CI build (0.3 issue?) so not sure how we handle this right now.

alyst · 2015-07-21T09:08:17Z

Indeed it does speed up the convergence (interestingly, contrary to what R's DEoptim recommends (population size = 10 x number of parameters), for my problem it works best with small (50) population on each worker).

I guess it will never run on 0.3, because it misses the key functionality. Since the PR uses MessageUtils master, which has no tagged version, it is not straightforward to run on 0.4 either.
When/if the new parallel building blocks would land to Julia master, I will update the PR. It could be disabled on 0.3 to make the tests pass.

So it's not ready to merge, but I though it's worth publishing as PR to provide some parallelization starting point for those that might be interested.

matthieugomez · 2015-07-21T13:01:19Z

Parallelization is great.

The functions I optimize generally use a lot of temporary memory, so I generally add a set preallocated arrays as an argument. When optimizing the function, I preallocate arrays, and then feed bboptimize with a closure

pa = PreallocateArray()
bboptimize(x -> f(x, pa))

I'm not sure how closures work with parallel workers. One solution would be to recreate arrays at every iterations

bboptimize(x -> f(x, PreallocateArray()))

but maybe there's a better way. I'd be great if this parallel solution handled these cases

alyst · 2015-07-21T13:48:44Z

@matthieugomez I think we should be able to safely combine preallocation and parallelization -- we just need to be sure that each worker receives its own copy of "functional object" f.
Actually, I'm already using similar approach that combines well with parallelization. For my problem I define a BBF <: OptimizationProblem type. It requires a few trivial methods: fitness_scheme(::BBF), name(::BBF), numobjectives(::BBF), numdims(::BBF), search_space(::BBF), and, most importantly, fitness(x::Vector{Float64}, f::BBF) function, i.e. my f(x).
In fitness(f, x) I can take advantage of whatever data or temporary arrays that are stored in BBF. The important point is that temporary array allocation happens on-demand, i.e. freshly created f::BBF object does not have any temporary arrays, so its copies could be safely distributed among the workers, and then each worker would have its own temporary memory automatically allocated when f(x) is first called.

In v0.4 it could be even simpler. You can define Base.call(f::F, x) method and then you can do f(x) without any need to create closures, which currently have poor performance in Julia. The only problem is that f would not be of type Function and e.g. Optim methods do not accept such arguments, but that should be easy to fix.

matthieugomez · 2015-07-21T14:07:28Z

Great. Thanks for the explanation.

robertfeldt · 2015-07-21T14:43:45Z

Suggestion: Allow multiple different WorkerMethods. That might increase robustness on some problems (since a single optimizer is never best for all problems) and might also help convergence (worse optimizers might get out of local optima by "inspiration" from a better optimizer etc). Would be very nice if one can have also non-population optimizers in there since some of them are very much quicker on "simple" problems and it is well known that DE benefits from early "help" with a few good solutions in the population.

alyst · 2015-07-21T16:21:46Z

Definitely that would be very nice to have. But that requires more advanced communication to solve the load balancing problems, because the current scheme assumes migration rates are the same for all workers. Ultimately, it would be nice to have something like a parallelized version of Amalgam.

robertfeldt · 2015-07-21T19:12:32Z

Ok, yes.

Amalgam is very nice but parallelism on a lower level then. We should experiment with a eval parallel "block" size that we do pmap on at some point. If we are only sending numdims floats over in a vector I guess it should be ok for most but the very simple fitness functions.

steven-varga · 2015-07-24T11:50:43Z

Hi

I am wondering if an MPI implementation of the above idea is welcome? if so I tried to find 'ParallelPopulationOptimizer' code but somehow I failed when forked Alexey Stukalov rep;

how can I get a copy/clone of the relevant files?
best,
steve

alyst · 2015-07-24T12:28:17Z

@steven-varga Hi, if you want to distribute the calculations over several machines, MPI could be a way to go (you can also consider ZeroMQ or use non-local cluster managers -- the latter should require minimal modifications to the current code). For one machine I think the current approach is better, because it uses native Julia task scheduling and data messaging.

If you want to test this pull request in Julia, you can try

Pkg.add("MessageUtils")
Pkg.checkout("MessageUtils") # parallel optimizer requires the master of MessageUtils
Pkg.clone("https://github.com/alyst/BlackBoxOptim.jl.git")
Pkg.checkout("BlackBoxOptim", branch="parallel_pop_opt")

But be warned that the code is still experimental. Also I've rebased it on top of new BlackBoxOptim API, but haven't tested yet. Pls let me know if you have further questions.

alyst · 2015-08-06T20:48:41Z

I've updated the code to use Channels very recently introduced into Julia 0.4-dev, so MessageUtils are no longer required. However, the "smoketest" using parallel optimizer stalls for some yet-to-be-investigated reason.

alyst · 2015-08-10T23:42:31Z

Updated to use recently introduced type-parameterized RemoteRefs and parallel exception handling.
The tests now pass on 0.4-nightly (the current CI failure is due to XNES smoketest randomly failed).
It starts to be more usable as now there should be no stalls due to worker threads silently crashing.

robertfeldt · 2015-11-28T10:21:46Z

@alyst This can now be closed, right, since it was superseeded by the ParallelEvaluator code?

alyst · 2015-11-28T10:35:17Z

@robertfeldt ParallelEvaluator covers methods that generate multiple candidates at once, e.g. NES. This master/slave optimizer [better] fits DE-like algorithms. So both can co-exist.
I had not rebased the PR recently (now that ParallelEvaluator is in, I would do it shortly), but it's in a working state, I'm using it in combination with DE for my problems.

I would leave it as PR. At least it gives a starting point for parallelization of multiple different optimizers.

robertfeldt · 2015-11-28T10:43:05Z

Ok, yes, I remember now. Ok lets leave this as PR for now. Great if you can take a look at the example of parallel eval so we can help guide people on its use. I guess we should use a NES alg in it, then.

alyst force-pushed the parallel_pop_opt branch from 4894a35 to 4429258 Compare July 24, 2015 08:39

alyst force-pushed the parallel_pop_opt branch from 4429258 to 09b0641 Compare August 6, 2015 20:41

alyst force-pushed the parallel_pop_opt branch 2 times, most recently from 520bebd to 20a2a69 Compare August 10, 2015 23:36

alyst force-pushed the parallel_pop_opt branch 3 times, most recently from 9f09ea9 to 8310746 Compare August 19, 2015 11:41

add ParallelPopulationOptimizer

ed6702d

alyst force-pushed the parallel_pop_opt branch from 8310746 to ed6702d Compare August 30, 2015 23:47

alyst mentioned this pull request Sep 17, 2015

add ParallelEvaluator #34

Merged

robertfeldt mentioned this pull request Jun 19, 2018

Running BlackBoxOptim on a Cluster #84

Open

alyst mentioned this pull request Sep 9, 2015

Ctrl-C when master process is waiting for crashed workers JuliaLang/Distributed.jl#29

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

WIP/RFC: parallel population optimizer #25

WIP/RFC: parallel population optimizer #25

alyst commented Jul 20, 2015

robertfeldt commented Jul 21, 2015

alyst commented Jul 21, 2015

matthieugomez commented Jul 21, 2015

alyst commented Jul 21, 2015

matthieugomez commented Jul 21, 2015

robertfeldt commented Jul 21, 2015

alyst commented Jul 21, 2015

robertfeldt commented Jul 21, 2015

steven-varga commented Jul 24, 2015

alyst commented Jul 24, 2015

alyst commented Aug 6, 2015

alyst commented Aug 10, 2015

robertfeldt commented Nov 28, 2015

alyst commented Nov 28, 2015

robertfeldt commented Nov 28, 2015

WIP/RFC: parallel population optimizer #25

Are you sure you want to change the base?

WIP/RFC: parallel population optimizer #25

Conversation

alyst commented Jul 20, 2015

robertfeldt commented Jul 21, 2015

alyst commented Jul 21, 2015

matthieugomez commented Jul 21, 2015

alyst commented Jul 21, 2015

matthieugomez commented Jul 21, 2015

robertfeldt commented Jul 21, 2015

alyst commented Jul 21, 2015

robertfeldt commented Jul 21, 2015

steven-varga commented Jul 24, 2015

alyst commented Jul 24, 2015

alyst commented Aug 6, 2015

alyst commented Aug 10, 2015

robertfeldt commented Nov 28, 2015

alyst commented Nov 28, 2015

robertfeldt commented Nov 28, 2015