Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[ LORA ] Update FC Layer to support LoRA's incremental forwarding & batch_size option. #2728

Merged
merged 2 commits into from
Sep 22, 2024

Conversation

EunjuYang
Copy link
Contributor

@EunjuYang EunjuYang commented Sep 5, 2024

This pull request (PR) consists of two commits:

  1. Update 'incremental_forwarding' for 'FullyConnectedLayer'.

    • Code updated to support multiple batches .
    • Code modified to work with incremental forwarding using LoRA.
  2. Fix bugs in 'fc_layer.cpp/h' handling of batch size with LoRA.

    • In the previous version, tensor dimensions used in LoRA computations did not account for batch size.
    • 'setBatch' function has been overrided to correctly update the batch size of tensors.

Self evaluation:
Build test: [X]Passed [ ]Failed [ ]Skipped
Run test: [X]Passed [ ]Failed [ ]Skipped

- This commit add some codes to support LoRA in incremental_forwarding.
- This commit updates the incremental_forwarding to support multiple
batch input. However, it is not the desirable way in that it cannot be
parallelized across the batch axis. I left this issue on the comment.

Self evaluation:

Build test: [X]Passed [ ]Failed [ ]Skipped
Run test: [X]Passed [ ]Failed [ ]Skipped

Signed-off-by: Eunju Yang <[email protected]>
@taos-ci
Copy link
Collaborator

taos-ci commented Sep 5, 2024

📝 TAOS-CI Version: 1.5.20200925. Thank you for submitting PR #2728. Please a submit 1commit/1PR (one commit per one PR) policy to get comments quickly from reviewers. Your PR must pass all verificiation processes of cibot before starting a review process from reviewers. If you are new member to join this project, please read manuals in documentation folder and wiki page. In order to monitor a progress status of your PR in more detail, visit http://ci.nnstreamer.ai/.

Comment on lines +275 to +278
input_step.dot(loraA, hidden_tmp_lora, false, false);
hidden_tmp_lora.dot(loraB, hidden_out_lora, false, false);
hidden_out_lora.multiply_i(lora_scaling);
hidden_step.add_i(hidden_out_lora);
Copy link
Member

@skykongkong8 skykongkong8 Sep 5, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pure question:
won't there be any possibility for lora_scaling to be defined as a scalar by any chance?
Although it is not supported in the nntrainer yet, if it is a scalar,
Then we can do something like:

hidden_tmp_lora.dot(loraB, hidden_out_lora, false, false, lora_scaling /*alpha*/, 0 /*beta*/);

in optimized GEMM case, computing everything in fused-ops..
(Or, even if it is a vector, I can implement a fused op for optimal performance)

To do so,
We should add params like alpha and beta at Tensor::dot() -> AFAIK current nntr Tensor::dot supports beta only...

FYI)
a normal GEMM can be defined as:

$$C = alpha * A * B + beta * C$$

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for asking a question !
The lora_scaling is nothing but $$\frac{\alpha}{rank}$$, which is a scalar (Please refer to https://arxiv.org/abs/2106.09685, Section 4.1). It is used for ease of finding the best rank $$r$$.
You're asking about this because there's a chance that we might need normal GEMM, right? It could be one of the cases where the generalized GEMM is used, but I'm not certain if it will be actively utilized.

e.g., The purspoe of using lora_scaling is to make a consistent of hyper-parameter across the various lora ranks. But I'm not sure it will be used in on-device training (rank will be fixed or the best rank might be explored in the PC-side)

Copy link
Collaborator

@taos-ci taos-ci left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@EunjuYang, 💯 All CI checkers are successfully verified. Thanks.

Copy link
Contributor

@djeong20 djeong20 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nice work! LGTM

Copy link
Contributor

@lhs8928 lhs8928 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall LGTM except minor comment.

@@ -148,7 +159,7 @@ void FullyConnectedLayer::finalize(InitLayerContext &context) {
true, TensorLifespan::FORWARD_DERIV_LIFESPAN);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is not related with this pr but isn't loraTmp used in forward and gradient not forward and derivative?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Exactly! I'm gonna update this. Thank you for your comment.

- In the previous code, LoRA didn't work for the case batch_size > 1.
- Tensors used in LoRA-related computation were not updated when the
batch size is upsted.
- `setBatch()` function is implemented for `FullyConnectedLayer`.
- BugFix in Lifespan of loraTmp Tensor: FORWARD_DERIV_LIFESPANE ->
FORWARD_GRAD_LIFESPAN

Self evaluation:

	Build test: [X]Passed [ ]Failed [ ]Skipped
	Run test: [X]Passed [ ]Failed [ ]Skipped

Signed-off-by: Eunju Yang <[email protected]>
Copy link
Collaborator

@taos-ci taos-ci left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@EunjuYang, 💯 All CI checkers are successfully verified. Thanks.

@jijoongmoon jijoongmoon merged commit 8104cbe into nnstreamer:main Sep 22, 2024
37 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants