-
Notifications
You must be signed in to change notification settings - Fork 40
This issue was moved to a discussion.
You can continue the conversation there. Go to discussion →
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Basic gradient_descent
optimization fails
#86
Comments
Interesting.
this yields exactly the same numbers as your gradient, but on other manifolds that R2 (your manifold is essentially Then, to see a little more what is happening (though I saw what the reason is directly) you can do xMean = gradient_descent(M, F, gradF, x0; debug= [:Iteration, :Cost,"\n",10]) to print a line of the iteration and the cost (allowed by a newline) every 10th iteration. If you then look for the default values of the gradient descent, you can see that it works with a constant step size by default (machine learning people call that learning rate), which is not guaranteed to converge (ever). But you can exchange that, see https://manoptjl.org/stable/plans/index.html#Stepsize-1 with a step size that might fit better, for example the Armijo step size rule, its defaults are chosen to be just fine, so julia> xMean = gradient_descent(M, F, gradF, x0; stepsize=ArmijoLinesearch(), debug= [:Iteration, :Cost,"\n",10])
InitialF(x): 2.831969279439222
# 10F(x): 0.2578005738010819
# 20F(x): 0.2000672051680747
# 30F(x): 0.20000006190528008
# 40F(x): 0.20000000005719792
# 50F(x): 0.2000000000000529
# 60F(x): 0.20000000000000012
2-element MVector{2, Float64} with indices SOneTo(2):
1.0000197125126835
0.9999999998505151 is this close enough? If not, you can tweak the stopping criterion, see https://manoptjl.org/stable/solvers/index.html#StoppingCriteria-1 and note that you can combine stopping criteria just with julia> xMean = gradient_descent(M, F, gradF, x0;
stepsize=ArmijoLinesearch(),
stopping_criterion=StopAfterIteration(200) | StopWhenGradientNormLess(10.0^-12),
debug=[:Iteration, :Cost,"\n",25]
)
InitialF(x): 2.831969279439222
# 25F(x): 0.2000020367614238
# 50F(x): 0.2000000000000529
2-element MVector{2, Float64} with indices SOneTo(2):
1.0000197125126835
0.9999999999999954 Let me know if this solves your problem, One could for example change the default to use Armijo - though that has per default quite some function evaluations to determine the step size. |
Adendum, after clicking on your link – it seems the allocate in ManifoldsBase does something strange in NelderMead!(M.manifold, F, [random_point(M) for _=1:3]) it does something. Note that Nelder Mead is not a very efficient algorithm and should only be used if you do not have any gradient. edit: sorry my bad :) I do not work with NelderMead usually. You should call also NelderMead(M.manifold, F, [random_point(M) for _=1:3]) just works fine, too as does NelderMead(M.manifold, F) But note that its convergence is veeeeeery slow (the default 2000 iterations are not enough). |
Hi Ronny Johan, thanks this helps me follow better too.
hmm, makes me curious about #64 again. I'll check when I get a chance and comment there if something shows up. |
No, that was my mistake at first glance – the error is that you have to provide a population not a starting point. Then everything works :) |
Oh and a final look, since you used BFGS in Optim. We have that on manifolds, too, just use julia> quasi_Newton(M, F, gradF, x0)
2-element MVector{2, Float64} with indices SOneTo(2):
1.0204551069445331
1.000000000000014 whose default is the (inverse) BFGS here, too. |
Ah, got it thanks! |
Perfect. Let me know when you think a default should be adapted or the docs should be changed (for example at first glance Stopping Criteria and Step Size rules could be made a little more prominent). |
Maybe just a |
Where? The default step size is mentioned at every algorithm as are the default sopping criteria. |
I made a PR on where I would add it... Feel free to ignore if that does not fit/work. Reason for putting it there is that when following a design template example for the first time, one expects the defaults to work (regardless of performance). When I get stuck on the first example, I find myself scrolling to the bottom of the "Getting Started" page hoping to find the obvious stumbling blocks. FAQ is second best place (but can be a discouraging search). Once a user sees their own first example working (different than just copy paste), then you start to invest more time to read deeper into all the docs. Training wheels first with absolute minimum details to get going, then scale up more and more on details from there. |
Manopt.jl does not have a FAQ. |
Thanks a lot, I did find stopping criteria and played around with it. However, I expected the default to converge in such a simple example. There are quite a few stumbling blocks for me, but I can already see the potential of Manopt. I know it's a relatively new package and perhaps we can help build out examples.
So, I'm in agreement with this, and that for defaults to rather work (with the possible cost of performance, with a note or something) especially if the user comes from other packages like Optim or JuMP where one can be a user without understanding the underlying algorithm. Perhaps something like this will help a new user like me a lot, from the Optim.jl docs:
|
It's perhaps not part of this issue, but let me list some problems I encounter: Manopt.NelderMead(MP, F, xp)
ERROR: MethodError: no method matching NelderMeadOptions(::ProductRepr{Tuple{ProductRepr{Tuple{MVector{2, Float64}, MMatrix{2, 2, Float64, 4}}}, ProductRepr{Tuple{MVector{2, Float64}, MMatrix{2, 2, Float64, 4}}}}}, ::StopAfterIteration; α=1.0, γ=2.0, ρ=0.5, σ=0.5, retraction_method=ExponentialRetraction(), inverse_retraction_method=LogarithmicInverseRetraction()) or perhaps other errors, eg: julia> xMean = quasi_Newton(M, F_prior, gradF_prior, x; debug= [:Iteration, :Cost,"\n",10])
InitialF(x): 1.4945569745366585
ERROR: vector_transport_to! not implemented on ProductManifold(TranslationGroup(2; field = ℝ), SpecialOrthogonal(2)) for arguments ProductRepr{Tuple{MVector{2, Float64}, MMatrix{2, 2, Float64, 4}}}, ProductRepr{Tuple{MVector{2, Float64}, MMatrix{2, 2, Float64, 4}}}, ProductRepr{Tuple{MVector{2, Float64}, MMatrix{2, 2, Float64, 4}}}, ProductRepr{Tuple{MVector{2, Float64}, MMatrix{2, 2, Float64, 4}}} and ParallelTransport. julia> particle_swarm(M, F_prior; n, x0)
ERROR: ArgumentError: broadcasting requires an assigned BroadcastStyle gradF(MP, xp)
ERROR: MethodError: no method matching fill!(::ProductRepr{Tuple{MVector{2, Float64}, MMatrix{2, 2, Float64, 4}}}, ::Int64) The other issues are more to do with a lack of knowledge coming from "I just want to optimize my cost function", such as
I hope this helps improve the quality of this great package and let me know if there is anywhere I can help. (although I still have a lot to learn) |
For the first and simple example, there is a certain tradeoff, especially on manifolds. We can still, for sure, switch to Armijo, I don‘t have a strong preference there. The summary sounds like a good idea, and holds here completely analogously for Newton with TR, and LBFGS (BFGS uses https://manoptjl.org/stable/solvers/quasi_Newton.html#Manopt.AbstractQuasiNewtonDirectionUpdate switch to https://manoptjl.org/stable/solvers/quasi_Newton.html#Manopt.QuasiNewtonLimitedMemoryDirectionUpdate for the LBFGS one, which might also be good to have a shorter constructor). Additionally, for non smooth functions, if you have probes use CyclicProximalPoint or DouglasRachford (Optim does not have those but there this package has also a little different focus, namely to include non smooth optimisation on manifolds) or if you are adventurous the ChambollePock algorithm from the most recent paper. For the summary – |
Thanks! A summary like that is perfect. I was actually looking for LBFGS, but didn't recognize it until you made it obvious just now 🤦 |
For this one without details I can not provide more details – where was it thrown, how did you call NelderMead, what are the variables? What is the Call stack? I can sadly not just guess from a signature what is wrong.
This seems more like an issue for Manifolds.jl, since
Again
What is
Can not be answered generically, depends heavily on the algorithm and the problem you use
AD is something that is not yet well covered (see my note on how many developers are here - just me – ), so the only AD support is the one you already used in Manifolds.jl. AD is on my list but just alone might take quite some time still.
I do not understand this, What is Mahalanobis? Where do you want to work with Covariances? As a manifold? |
This might mean, as I said, that LBFGS should get a nice short constructor and a nice tutorial. This Algorithm was the bachelor thesis os a student of mine, and a tutorial was not part of that ;) |
Yes sorry, my intention was not to get the answers to the questions (thanks for trying though) but to just list general stumbling blocks. |
Sure, carefully craft MWEs or MnWEs and open issues – but also look where the issue might come from, i.e. is it more like Manopt of Manifolds. |
This probably refers to the way covariance matrix for a multivariate normal distribution can be obtained from the Hessian of logpdf like here: https://onlinelibrary.wiley.com/doi/pdf/10.1002/9780470824566.app1 . I guess there should be a manifold equivalent for some generalization of the multivariate normal distribution? And this is the Mahalanobis distance: https://en.wikipedia.org/wiki/Mahalanobis_distance and it uses the covariance matrix. |
Ah, that was a too long way for me to get from this short note – especially since I am mote into optimization and less into distributions. Thanks for the information, sure this could be used in tangent spaces to some extend, though I am not sure about the details. |
Yes, that is spot on. If you know of any good sources on the subject it would be great. I still have a lot to learn. |
Unfortunately, I still don't get the results I expect in general, here is the basic example updated to SE(2) I experimented with step size but it didn't seem to make a difference. using Manopt
using Manifolds
using StaticArrays
# "factor"
function Prior_distance(M, meas, p)
return distance(M, meas, p)
end
M = SpecialEuclidean(2)
priorx1 = ProductRepr(SA[1.1, 1.0], SA[0.707107 -0.707107; 0.707107 0.707107])
priorx2 = ProductRepr(SA[0.9, 1.0], SA[0.707107 -0.707107; 0.707107 0.707107])
# Factor graph with 2 priors on variable x
function F(M,x)
return Prior_distance(M, priorx1, x) + Prior_distance(M, priorx2, x)
end
function gradF(M, x)
Manifolds.gradient(M,(x)->F(M,x), x)
end
# start value
x0 = ProductRepr(SA[0.,0], SA[1. 0; 0 1])
# test cost function
F(M, x0)
gradF(M, x0)
# run
xMean = gradient_descent(M, F, gradF, x0; stepsize=ArmijoLinesearch(), debug= [:Iteration, :Cost,"\n",10])
InitialF(x): 3.600341492608454
# 10F(x): 3.526239029861806
# 20F(x): 3.5165928358291243
# 30F(x): 3.5165823494284916
# 40F(x): 3.516582337500095
# 50F(x): 3.516582337489366
# 60F(x): 3.5165823374868284
# 70F(x): 3.5165823374884537
# 80F(x): 3.5165823374881358
# 90F(x): 3.516582337486814
# 100F(x): 3.5165823374881002
# 110F(x): 3.516582337487348
# 120F(x): 3.516582337489555
# 130F(x): 3.5165823374886473
# 140F(x): 3.516582337487341
# 150F(x): 3.516582337488459
# 160F(x): 3.5165823374871947
# 170F(x): 3.5165823374887344
# 180F(x): 3.5165823374883733
# 190F(x): 3.5165823374862595
# 200F(x): 3.516582337488268
ProductRepr with 2 submanifold components:
Component 1 =
2-element MVector{2, Float64} with indices SOneTo(2):
0.9999999999999774
1.0000000000001388
Component 2 =
2×2 MMatrix{2, 2, Float64, 4} with indices SOneTo(2)×SOneTo(2):
-0.440261 -0.89787
0.89787 -0.440261
The same code errors with quasi_Newton, see the issue at "JuliaManifolds/Manifolds.jl#384" |
What do you expect? |
The rotation part should be
If you look at the gradient and cost function of the answer it also shows it's not working:
|
hen I think your gradient is wrong? Keep in mind that AD gradients on manifolds (in Manifolds.jl) are experimental only and heavily rely on the embedding currently. Please try with the gradient I proposed. |
I just tried, it's exactly the same. |
HM, the gradient really looks strange, since the points are identical in the second component, I have no clue why the gradient is not zero from the start. I currently do not have the time to invest, but it might be reasonable to just look at an example on |
I think you are running into a problem with convexity. Keep in mind, that all algorithms here (and in Optim) always require convexity for convergence to a minimiser. On manifolds (not all but for example the sphere or rotations) this is a huge problem, since there is no global convexity. So you have to be close enough to the staring point. Just as a short check, take Then surely the gradient is zero and grad desc stops even before the first iterate. But your starting point might be too far away I think. So this is – I think inherently a problem on non-Hadamard manifolds, to drop the terminology, i.e. manifolds with finite infectivity radius. |
Ok thank you. I'll investigate some more at a later stage. As a last example, I took your advice and tried with Rotations only and changed the initial to closer to the value, without success. using Manopt
using Manifolds
using StaticArrays
M = SpecialOrthogonal(2)
priorx1 = SA[0.707107 -0.707107; 0.707107 0.707107]
function F(M,x)
return distance(M, priorx1, x)
end
function gradF(M, x)
grad_distance(M, priorx1, x, 1)
end
# start value
x0 = SA[1. 0; 0 1]
x0 = SA[0.632981 -0.774167; 0.774167 0.632981]
# x0 = SA[0.707107 -0.707107; 0.707107 0.707107]
# test cost function
F(M, x0)
gradF(M, x0)
# run
stopping_criterion = (StopWhenAny(StopAfterIteration(2000),StopWhenGradientNormLess(10.0^-8)))
stepsize=ArmijoLinesearch()
# stepsize = ConstantStepsize(0.1)
# stepsize = WolfePowellBinaryLinesearch()
debug= [:Iteration, :Cost,"\n",10]
xMean = gradient_descent(M, F, gradF, x0; stopping_criterion, stepsize, debug)
F(M, xMean)
gradF(M, xMean) |
Yeah, but again, look the gradient and you can also
then you see: Your gradient Norm is 1, your step size ends up being 7 and your infectivity radius (in this situation maybe interpretable as the maximum step size that is actually reasonable) is less that 5. From the second iteration onwards. So I maybe would not recommend Armijo here, due to the nonconvexity. |
Ah there is one other major issue with your function. edit: Just noticed that your last example is just one point. I am confused that that does not finish in one step since the gradient points directly to the minimiser anyways. Maybe there is some issue with rotations, but I did not write that manifold, so I would have to invest quite some tie first checking that code, though, for which I do not have time at the moment (moved internationally, started new job, these things) – ah no Rotations look ok themselves. |
I'm still a beginner with manifold-valued statistics. Sources tend to be very heavy on math. Xavier Pennec does some really cool work in that topic (see for example https://hal.inria.fr/inria-00614994/PDF/Pennec.JMIV06.pdf ) but there may be better sources. @sethaxen could you recommend some other good sources for manifold statistics? Regarding minimization, I'll try to find some time tomorrow to check it. One quick note, @kellertuer , |
Congrats! No worries on time delay know these discussion take up a lot of time, thanks for all the help! Then also, thanks for the references. I am in need of an on-manifold covariance estimator myself, and will be building one at some point. I'll share back anything useful (xref JuliaRobotics/IncrementalInference.jl#1280). |
Thanks – I can just not recommend to move between countries in these times, border rules are a little complicated these days. Xavier Pennec is for sure a perfect source for statistics on manifolds.
The problem is, that I am not aware of works that do take it into account, hence the line search here follows the same ideas as in Manopt and pymanopt. The reason is, that even if you exceed the (global) injectivity radius it might (a) locally ok, because the radius at that point is larger or (b) even be globally ok to do the step in that case, though in is not injective. So there is no good rule when to do or not to do a too large step. Here, however, I am very sure, this kills the example. |
Regarding parallel transport, this question: https://math.stackexchange.com/questions/2066207/how-can-one-define-parallel-transport-on-lie-groups relates parallel transport on Lie groups with differentials of left translation. Our metric on SO(n) is bi-invariant so parallel transport should agree with left And, regarding statistics on manifolds, one additional thing: I have this book https://www.amazon.pl/Riemannian-Geometric-Statistics-Medical-Analysis/dp/0128147253 on to-read list, it seems to be not too dense in math. |
If I understand this correctly (see chapter 5.3 of that book, preprint available here: https://hal.inria.fr/hal-02342137/document ), the parallel transport of the (-) Cartan-Schouten connection is the identity on SO(n), but it's different from the Levi-Civita connection. |
* Update src/tutorials/MeanAndMedian.jl xref #86 (comment) Co-authored-by: Ronny Bergmann <[email protected]>
Ahh, thanks. I was working from our factors and missed it since it's done with the Mahalanobis distance in our case. It should be distance squared. I still can't get a satisfactory result though. Perhaps I should take a step back and rather ask how to solve on manifolds such as SE(2), SE(3), SO(2), SO(3). Perhaps start with SE(2) for a POC. #M - manifold
#p and q are points on manifold
#m is a measurement from point p
# meas is the prior measurement
function PointPoint(M, m, p, q)
q̂ = compose(M, p, m)
return distance(M, q, q̂)^2
end
function Prior(M, meas, p)
#
return distance(M, meas, p)^2
end The factors are combined into cost functions that depends on the structure of the factor graph. |
Distance squared is better, that is geodetically convex (so locally it is convex. For distance squared, keep in mind that globally this might still be non-unique. For example on the sphere if you minimise with respect to x the function distance(M, n, x)^2 + distance(M, s, x)^2 Where n and s are North and South Pole, respectively. Then any x on the equator is a minimiser of the function. Your last sentence, I again can‘t follow. what factrors? what is to combine here? Which graph? What is its structure? |
Hi, I found something interesting. If i set |
That is good to know :) Retractions should locally be as fine as exp (inverse retractions as fine as log), but are usually cheaper or numericaly more stable. So good that you checked. |
That is where IncrementalInference.jl comes in, in short, you will get the ring around the equator as a belief if you set it up that way. For example figure 1 in https://marinerobotics.mit.edu/sites/default/files/fourie_iros19_manifolds.pdf has a similar case on a circular manifold. |
Sorry, an explanation will be too much for a comment. Perhaps a basic example: #for a user-defined graph that looks like this:
p -- x1 -- f1 -- x2 -- f2 -- x3
\ /
------- f3 ------
#the cost would look like this:
cost(M, x) = Prior(M, p, x[1])^2 + PointPoint(M, f1, x[1], x[2])^2 +
PointPoint(M, f2, x[2], x[3])^2 + PointPoint(M, f3, x[1], x[3])^2 |
This issue was moved to a discussion.
You can continue the conversation there. Go to discussion →
Hi I'm new to Manopt and I'm trying to figure out how to integrate Manopt into IncrementalInference (also see JuliaRobotics/IncrementalInference.jl#1276 for more complex tries).
The optimization is formulated as a factor graph and this example represents a simple factor graph with one variable and 2 priors (so basically the minimum of 2 points). I don't get the results I expect and I made this MWE:
cc @dehann
EDIT: I'm on
[1cead3c2] Manifolds v0.5.4
[0fc0a36d] Manopt v0.3.9
The text was updated successfully, but these errors were encountered: