-
Notifications
You must be signed in to change notification settings - Fork 1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
tests: add gradient tests for all backends #932
tests: add gradient tests for all backends #932
Conversation
I am not very familiar with the |
bcc78e0
to
4e15cdf
Compare
I pushed an extension to
|
tests/test-backend-ops.cpp
Outdated
|
||
ggml_build_forward_expand(gf, out); | ||
ggml_graph_cpy(gf, gb); | ||
ggml_build_backward_expand(ctx, gf, gb, true); // TODO why can the results sometimes be wrong with keep == false? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Unless we are computing second-order gradient (i.e. backward of backward) the keep
should be set to false
(though it should work with keep == true
as well). In this case, if the results are sometimes wrong when keep == false
, this would mean there is likely a bug somewhere.
The computation of backward graphs assumes that the initial grad
tensors are initialized with zeros. This was the purpose of ggml_graph_reset()
, though today this function only makes sense for the CPU backend.
How do I reproduce this issue?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
On the last commit I also no longer have issues with keep == false
. The problem I was seeing was likely caused by #943 and I misinterpreted the cause.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some of the gradient tests are failing occasionally (SIN, COS, ROPE) - I guess the error threshold might have to be adjusted, but we can do that as we go.
Might want to wait for @slaren review as well before merging
The failure rate of |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There are a lot more cases like the last two. If every created tensor needs to be a param, do it automatically in the ggml_new_tensor
functions in parent class. If the reason some are disabled is because ggml does not support backwards pass, then extract that to a function that makes that clear, and disable it by default.
It is very important that test-backend-ops
remains very simple to add new test cases for, and nobody is going to understand if they need to add calls to ggml_set_param
for their test or not.
tests/test-backend-ops.cpp
Outdated
if (ggml_is_matrix(in) && ggml_is_vector(rows)) { | ||
ggml_set_param(ctx, in); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why?
tests/test-backend-ops.cpp
Outdated
if (op == ggml_add || ggml_are_same_shape(a, b)) { | ||
ggml_set_param(ctx, a); | ||
ggml_set_param(ctx, b); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why?
There are some tricky edge cases like
|
I think it would be enough to add some comments explaining the cases where |
d419343
to
8657257
Compare
I added a small bit of documentation to the top of the file to explain the intended usage as well as a heavily commented example struct for a new op that covers the bare minimum needed for a new test. Going through the code I noticed that it's possible to avoid explicit logic for |
My goal with this PR is to add tests that check gradients for the backwards pass in a generic way for all backends. My current state is that I can successfully calculate and check gradients for CUDA as long as all ops are FP32 and supported. This PR is nowhere near usable but I'm opening it anyways to discuss the best way to implement the tests. Right now I'm doing the tests via modification of
tests/test-grad0
(and some hacks) but I think long-term it would make more sense to re-use the code intests/test-backend-ops
. From my perspective the easiest way to do this would be extend the existing code constructs intests/test-backend-ops
with a new mode that checks gradients numerically. @slaren your feedback would be appreciated.