Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feature Request: Second Derivatives for User Defined Functions #1198

Closed
UserQuestions opened this issue Mar 14, 2018 · 10 comments · Fixed by #2961
Closed

Feature Request: Second Derivatives for User Defined Functions #1198

UserQuestions opened this issue Mar 14, 2018 · 10 comments · Fixed by #2961
Labels
Category: Nonlinear Related to nonlinear programming Type: Feature request
Milestone

Comments

@UserQuestions
Copy link

It would be extremely helpful for JuMP to support second derivatives for user-defined functions. Ideally this could be done as efficiently as ReverseDiffSparse, but even just calling ForwardDiff.Hessian! would be a helpful option. There are a broad class of problems that require optimizing functions that do not easily translate into the typical JuMP syntax (especially in ML/Statistics), so having JuMP able to handle such cases would be a huge benefit to people working on those problems and will greatly expand the number of potential JuMP users.

@mlubin
Copy link
Member

mlubin commented Mar 14, 2018

I'd be curious if you could point to one or a couple examples in ML/Statistics where there's a significant benefit from using methods that require second-order derivatives over first-order methods that JuMP already supports with user-defined functions. This would help justifying the implementation effort. (But either way, I don't expect to spend time on this in the next few months given everything else going on with JuMP development.)

@UserQuestions
Copy link
Author

Thanks for your prompt response, and thanks for creating this amazing package.

Virtually all forms of machine learning and statistics involve choosing an optimal parameter theta such that an in-sample or out-of-sample loss function L(y,f(x,theta)) is minimized, where x is data being used to model or predict y. The predictive/modeling function f(x,theta) can be very complex to articulate because it is often designed either to be very flexible or to reflect the rules for a true data generating process. One example is neural-nets, which use a f(x,theta) that is the iterated composition of many linear and non-linear functions as a flexible predictive function. In biostatistics, the data generating process f(x,theta) often requires computing the evolution of a complex system. In economics, f(x,theta) can involve computing equilibrium strategies for the firms or individuals that generated the data (this may even require solving for a fixed point of a multi-function or nested optimization problems). I'm also happy to provide specific references to examples in some of these fields.

@UserQuestions
Copy link
Author

As for why one needs second-order methods: in some cases, L(y,f(x,theta)) may have low dimension (in theta) but extremely long time to compute (potentially on the order of minutes, hours, or even longer). In such cases, second order methods might help reduce the number of function evaluations required. Additionally, it is not uncommon for L(y,f(x,theta)) to have substantial cross-partials in theta, meaning that exact second order methods will do much better at finding the true argmin than Hessian approximations.

@shoshievass
Copy link

Just want to second @UserQuestions here - there are a number of cases in Economics (at the least) that require two-step optimization where the inner ("first" step) requires solving contraction mappings, etc. and may not be feasibly described in JuMP syntax - but the optimization of which faces serious improvements from being able to use the second derivative..

@mlubin mlubin added the Category: Nonlinear Related to nonlinear programming label Feb 2, 2019
@raphaelchinchilla
Copy link

more than two years after, but I want to third @UserQuestions . It would be a game changer if JuMP could do optimization with Hessian, and especially if the Hessian was a sparse array. CasADi, a second order optimizer with autodiff based on Ipopt is the only thing that keeps me in Python....

@odow
Copy link
Member

odow commented Dec 7, 2020

Here's an example from Discourse where Ipopt failed to converge without the second-order information on the problem as formulated by the user: https://discourse.julialang.org/t/nonlinear-objective-function-splatting/51251 (However, it could be reformulated and solved with first-order information only.)

@raphaelchinchilla
Copy link

raphaelchinchilla commented Dec 7, 2020

For all of you curious about this issue, one can pass the function, gradient and hessian directly to Ipopt.jl, without using JUMP and the MathOptInterface. The documentation on how to do it is somewhat hidden but can be found Ipopt.jl/doc/ipopt.rst (Edit: odow): https://github.com/jump-dev/Ipopt.jl#c-interface-wrapper

In my tests, one could use AD tools in the definition of the functions (such as ForwardDiff.jl or Zygote.jl) or use the ModelingToolkit.jl to compile the gradient and hessian. I think also that using ComponentArrays.jl could be use, which would make the definition of the functions easier, but I have not tested it.

I personally fail to understand why developers continue pushing MathOptInterface for nonlinear problems. It is true that it does a great job for convex problems, but for nonlinear problems it falls so much behind of what one needs, not being able to use vectors and not being able to pass the hessian, that I would classify it as experimental at this point.

@odow
Copy link
Member

odow commented Dec 7, 2020

I personally fail to understand why developers continue pushing MathOptInterface for nonlinear problems.

We're aware of the current NLP limitations of MOI: jump-dev/MathOptInterface.jl#846

We encourage people to use JuMP because many users already know the JuMP syntax, but they won't know about the specifics of computing gradients and hessians. If you have specific needs that JuMP isn't meeting, it probably isn't the right tool for the job, and you should consider other options:

@mlubin
Copy link
Member

mlubin commented Dec 7, 2020

I think there's plenty of agreement on what the areas for improvement are for JuMP and MOI with regards to nonlinear optimization.

I personally fail to understand why developers continue pushing MathOptInterface for nonlinear problems

I'm not sure what you mean by "pushing". We all want the state of the art to improve and are more than happy to point people to the best tools for the job. For example, I've been advocating for a CasADi interface in Julia since 2014: casadi/casadi#1105.

@raphaelchinchilla
Copy link

@odow and @mlubin, I did not mean to be aggressive with respect to my comment on "pushing", I am sorry. English is not my first language, so I might not have expressed what I wanted correctly. I have a great admiration for the work that is being done by you all.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Category: Nonlinear Related to nonlinear programming Type: Feature request
Development

Successfully merging a pull request may close this issue.

5 participants