-
Notifications
You must be signed in to change notification settings - Fork 74
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[ LORA ] Update FC Layer to support LoRA's incremental forwarding & batch_size option. #2728
Conversation
- This commit add some codes to support LoRA in incremental_forwarding. - This commit updates the incremental_forwarding to support multiple batch input. However, it is not the desirable way in that it cannot be parallelized across the batch axis. I left this issue on the comment. Self evaluation: Build test: [X]Passed [ ]Failed [ ]Skipped Run test: [X]Passed [ ]Failed [ ]Skipped Signed-off-by: Eunju Yang <[email protected]>
📝 TAOS-CI Version: 1.5.20200925. Thank you for submitting PR #2728. Please a submit 1commit/1PR (one commit per one PR) policy to get comments quickly from reviewers. Your PR must pass all verificiation processes of cibot before starting a review process from reviewers. If you are new member to join this project, please read manuals in documentation folder and wiki page. In order to monitor a progress status of your PR in more detail, visit http://ci.nnstreamer.ai/. |
input_step.dot(loraA, hidden_tmp_lora, false, false); | ||
hidden_tmp_lora.dot(loraB, hidden_out_lora, false, false); | ||
hidden_out_lora.multiply_i(lora_scaling); | ||
hidden_step.add_i(hidden_out_lora); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pure question:
won't there be any possibility for lora_scaling
to be defined as a scalar by any chance?
Although it is not supported in the nntrainer yet, if it is a scalar,
Then we can do something like:
hidden_tmp_lora.dot(loraB, hidden_out_lora, false, false, lora_scaling /*alpha*/, 0 /*beta*/);
in optimized GEMM case, computing everything in fused-ops..
(Or, even if it is a vector, I can implement a fused op for optimal performance)
To do so,
We should add params like alpha
and beta
at Tensor::dot()
-> AFAIK current nntr Tensor::dot supports beta
only...
FYI)
a normal GEMM can be defined as:
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for asking a question !
The lora_scaling
is nothing but
You're asking about this because there's a chance that we might need normal GEMM, right? It could be one of the cases where the generalized GEMM is used, but I'm not certain if it will be actively utilized.
e.g., The purspoe of using lora_scaling
is to make a consistent of hyper-parameter across the various lora ranks. But I'm not sure it will be used in on-device training (rank will be fixed or the best rank might be explored in the PC-side)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@EunjuYang, 💯 All CI checkers are successfully verified. Thanks.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nice work! LGTM
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Overall LGTM except minor comment.
nntrainer/layers/fc_layer.cpp
Outdated
@@ -148,7 +159,7 @@ void FullyConnectedLayer::finalize(InitLayerContext &context) { | |||
true, TensorLifespan::FORWARD_DERIV_LIFESPAN); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is not related with this pr but isn't loraTmp used in forward and gradient not forward and derivative?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Exactly! I'm gonna update this. Thank you for your comment.
- In the previous code, LoRA didn't work for the case batch_size > 1. - Tensors used in LoRA-related computation were not updated when the batch size is upsted. - `setBatch()` function is implemented for `FullyConnectedLayer`. - BugFix in Lifespan of loraTmp Tensor: FORWARD_DERIV_LIFESPANE -> FORWARD_GRAD_LIFESPAN Self evaluation: Build test: [X]Passed [ ]Failed [ ]Skipped Run test: [X]Passed [ ]Failed [ ]Skipped Signed-off-by: Eunju Yang <[email protected]>
36d6d66
to
7c1b98c
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@EunjuYang, 💯 All CI checkers are successfully verified. Thanks.
This pull request (PR) consists of two commits:
Update 'incremental_forwarding' for 'FullyConnectedLayer'.
Fix bugs in 'fc_layer.cpp/h' handling of batch size with LoRA.
Self evaluation:
Build test: [X]Passed [ ]Failed [ ]Skipped
Run test: [X]Passed [ ]Failed [ ]Skipped