Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Loss&Test] Implement KLD Loss & Add Unit Test on loss layers @open sesame 10/28 13:48 #2757

Merged
merged 6 commits into from
Nov 7, 2024

Conversation

DonghakPark
Copy link
Member

@DonghakPark DonghakPark commented Oct 16, 2024

In this PR

In this PR, performed three tasks as follows:

commit 1 - [UnitTest] Add kld-loss unit test

  • To increase coverage, we added a KLD loss layer using LayerSemanticsParamType to the unit test case.

commit 2 - [UnitTest] Add integration_tests on unit test

  • Considering the characteristics of NNTrainer, it was difficult to accurately test one loss or BN layer, so we created integration tests and wrote exception cases.

commit 3 - [Loss] Implement KLD Loss Layer

  • implemented KLD Loss in torch
    image
  • KLD_loss forwarding formula
    image

Self evaluation:
1. Build test: [X]Passed [ ]Failed [ ]Skipped
2. Run test: [X]Passed [ ]Failed [ ]Skipped

Signed-off-by: Donghak PARK [email protected]

@taos-ci
Copy link
Collaborator

taos-ci commented Oct 16, 2024

📝 TAOS-CI Version: 1.5.20200925. Thank you for submitting PR #2757. Please a submit 1commit/1PR (one commit per one PR) policy to get comments quickly from reviewers. Your PR must pass all verificiation processes of cibot before starting a review process from reviewers. If you are new member to join this project, please read manuals in documentation folder and wiki page. In order to monitor a progress status of your PR in more detail, visit http://ci.nnstreamer.ai/.

Copy link
Collaborator

@taos-ci taos-ci left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@DonghakPark, 💯 All CI checkers are successfully verified. Thanks.

@SeoHyungjun
Copy link
Member

There is a typo in 'implemented KDL Loss layer as below'.
It should be corrected to 'implemented KLD Loss layer as below.'

Comment on lines 39 to 65
mu.pow(2.0f, temp); // 1. temp = mu ^ 2
log_std.subtract(temp, before_sum); // 2. before_sum = log_std - temp
log_std.apply<float>(expf, temp); // 3. temp = exp(log_std) - 1
temp.subtract_i(1.0f);
before_sum.subtract_i(temp); // 4. before_sum = before_sum - temp
before_sum.sum({1, 2, 3}, ret, -0.5); // 5. sum * 0.5
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't fully understand the forward implementation of KLD Loss in commit 3.
Are the written formulas and actual implementations the same?

I would like to know the difference between the formula on line 32 of kld_loss_layer.cpp and the PyTorch's KLD formula added to the PR.

In PyTorch, it is summarized as follows:

if not log_target: # default
    loss_pointwise = target * (target.log() - input)
else:
    loss_pointwise = target.exp() * (target - input)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I should have provided more detailed explanations. Upon reviewing your feedback, it seems necessary to align with other frameworks' usage. Thank you for bringing this to my attention.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Eunju provided additional explanations about logic through the comment below.
I think it would be great if we make a decision with our team members based on that opinion!

Comment on lines 39 to 65
mu.pow(2.0f, temp); // 1. temp = mu ^ 2
log_std.subtract(temp, before_sum); // 2. before_sum = log_std - temp
log_std.apply<float>(expf, temp); // 3. temp = exp(log_std) - 1
temp.subtract_i(1.0f);
before_sum.subtract_i(temp); // 4. before_sum = before_sum - temp
before_sum.sum({1, 2, 3}, ret, -0.5); // 5. sum * 0.5
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The comment says "sum * 0.5".
However, in the code, "-0.5" is used.
Is this correct?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice Catch --> -0.5 is correct, at code i apply -0.5


void KLDLossLayer::forwarding(RunLayerContext &context, bool training) {}
void KLDLossLayer::forwarding(RunLayerContext &context, bool training) {
// -0.5 * sum(1 + log_std - pow(mu, 2) - exp(log_std))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If it is the same as torch's KLD formula, it would be great if you could write down the derivation process of the formula like in the comment for the calcDerivative function.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay i will fix


void KLDLossLayer::forwarding(RunLayerContext &context, bool training) {}
void KLDLossLayer::forwarding(RunLayerContext &context, bool training) {
// -0.5 * sum(1 + log_std - pow(mu, 2) - exp(log_std))
Copy link
Contributor

@EunjuYang EunjuYang Oct 16, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree with Hyungjun. Torch's KLD loss has their own assumptions : (1) reverse KL divergence (2) they are assuming the value from model takes log.

For general KL divergence,

# KL(P||Q) = \sum { P (log P - log Q) }
(P * (P / Q).log()).sum()

How to use KLD loss of pytorch

# in general, kld loss is used in the form of
torch.nn.functional.kl_div(Q.log(), P, None, None, 'sum')

Like this, we need to clarify the assumption on how to use this loss function -- including input definition, etc. In this update, these kind of explanations are missing. In order to review this PR correctly, some explanations are required.

Also, I couldn't understand why you used notation of mu and log_std; it is not equivalent with the general kl loss I think. Do you assume KLD loss for Gaussian? (e.g., VAE KL loss - it seems this KLD loss is a specific one to compute KLD btw/ normal distribution and standard normal)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for your detailed review. Based on the formula below,
https://stackoverflow.com/questions/74865368/kl-divergence-loss-equation

image

I have attached the Torch formula to assist in understanding. As Hyungjun and Eunju mentioned, it seems necessary to discuss which assumptions we need to implement this.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for your detailed explanation. As i mentioned in the comment, however, it seems the forwarding of KLD loss only covers the special case of $KL(p(x)||q(x))$, where $q(x) \sim \mathcal{N}(\mu , \sigma^2)$ and $p(x) \sim \mathcal{N}(0,1)$.
The link you shared refers the code of a page, which considers the special case of KL loss for training VAE, (here). I recommend you to consider updating this to make our KLD loss cover more general form :)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, i see
i will update kld loss for more general cases

@DonghakPark DonghakPark force-pushed the loss_TDD branch 2 times, most recently from 2f24a4e to 3475994 Compare October 16, 2024 23:44
@taos-ci
Copy link
Collaborator

taos-ci commented Oct 17, 2024

:octocat: cibot: @DonghakPark, A builder checker could not be completed because one of the checkers is not completed. In order to find out a reason, please go to http://ci.nnstreamer.ai/nntrainer/ci/repo-workers/pr-checker/2757-202410170844270.1929669380188-3475994cecf485dfba4d34fa1aeab25e0bb0695e/.

Copy link
Collaborator

@taos-ci taos-ci left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@DonghakPark, 💯 All CI checkers are successfully verified. Thanks.

Copy link
Contributor

@EunjuYang EunjuYang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for considering my previous comment.
Please check some suggestions below :)

Comment on lines 32 to 41
/**
* 1. Output = label / predicted
* 2. Output = output * label
* 3. Output = log(output)
* 4. Output = sum(output)
*/
label.divide(predicted, temp);
temp.multiply_i(label);
temp.apply<float>(logf, temp);
output.fill(temp.sum({0, 1, 2, 3}));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The code seems like (P * (P / Q)).log() which is different from P * (P/Q).log().
Please change the code like

Suggested change
/**
* 1. Output = label / predicted
* 2. Output = output * label
* 3. Output = log(output)
* 4. Output = sum(output)
*/
label.divide(predicted, temp);
temp.multiply_i(label);
temp.apply<float>(logf, temp);
output.fill(temp.sum({0, 1, 2, 3}));
/**
* 1. Output = label / predicted
* 2. Output = log(Output)
* 3. Output = Output * label
* 4. Output = sum(output)
*/
label.divide(predicted, temp);
temp.apply<float>(logf, temp);
temp.multiply_i(label);
output.fill(temp.sum({0, 1, 2, 3}));

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

!! Thank you for check

Comment on lines 54 to 55
label.divide_i(predicted);
deriv.fill(label);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it is just a coding style, but it is just a simple suggestion from my side:

Suggested change
label.divide_i(predicted);
deriv.fill(label);
label.divide(predicted, deriv);

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good Point, i will apply your review

@DonghakPark DonghakPark force-pushed the loss_TDD branch 2 times, most recently from b3e0d9a to 799f131 Compare October 23, 2024 08:36
@taos-ci
Copy link
Collaborator

taos-ci commented Oct 23, 2024

:octocat: cibot: @DonghakPark, A builder checker could not be completed because one of the checkers is not completed. In order to find out a reason, please go to http://ci.nnstreamer.ai/nntrainer/ci/repo-workers/pr-checker/2757-202410231736100.5796000957489-799f131355f83f2286d6282d4c341211a970fac7/.

Copy link
Contributor

@EunjuYang EunjuYang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

Copy link
Collaborator

@taos-ci taos-ci left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@DonghakPark, 💯 All CI checkers are successfully verified. Thanks.

Copy link
Member

@skykongkong8 skykongkong8 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it's good to go!

Currently, there is no unit test for the KLD loss. Therefore, I am adding a new unit test to increase code coverage.
- use pre defined unit test
- (LayerSemanticsParamType)

**Self evaluation:**
1. Build test:	 [X]Passed [ ]Failed [ ]Skipped
2. Run test:	 [X]Passed [ ]Failed [ ]Skipped

Signed-off-by: Donghak PARK <[email protected]>
In the case of NNTrainer, there are many instances where one function works in response to another function, so we have created integration tests for unit testing.
- There are exceptional cases depending on the activation when it comes to losses.
- For instance with BN (Batch Normalization), you can compare accuracy after conducting several epochs of testing.

**Self evaluation:**
1. Build test:	 [X]Passed [ ]Failed [ ]Skipped
2. Run test:	 [X]Passed [ ]Failed [ ]Skipped

Signed-off-by: Donghak PARK <[email protected]>
Implement KLD Loss on NNtrainer

**Self evaluation:**
1. Build test:	 [X]Passed [ ]Failed [ ]Skipped
2. Run test:	 [X]Passed [ ]Failed [ ]Skipped

Signed-off-by: Donghak PARK <[email protected]>
Apply coding style with clang-format ( unittest files )

**Self evaluation:**
1. Build test:	 [X]Passed [ ]Failed [ ]Skipped
2. Run test:	 [X]Passed [ ]Failed [ ]Skipped

Signed-off-by: Donghak PARK <[email protected]>
Implement Finalize on KLD Loss

**Self evaluation:**
1. Build test:	 [X]Passed [ ]Failed [ ]Skipped
2. Run test:	 [X]Passed [ ]Failed [ ]Skipped

Signed-off-by: Donghak PARK <[email protected]>
Update KLD Loss Function Reflect review

**Self evaluation:**
1. Build test:	 [X]Passed [ ]Failed [ ]Skipped
2. Run test:	 [X]Passed [ ]Failed [ ]Skipped

Signed-off-by: Donghak PARK <[email protected]>
Copy link
Collaborator

@taos-ci taos-ci left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@DonghakPark, 💯 All CI checkers are successfully verified. Thanks.

@DonghakPark DonghakPark changed the title [Loss&Test] Implement KLD Loss & Add Unit Test on loss layers @open sesame 10/17 17:36 [Loss&Test] Implement KLD Loss & Add Unit Test on loss layers @open sesame 10/28 13:48 Oct 28, 2024
Copy link
Collaborator

@taos-ci taos-ci left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@DonghakPark, 💯 All CI checkers are successfully verified. Thanks.

Copy link
Collaborator

@jijoongmoon jijoongmoon left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Copy link
Contributor

@djeong20 djeong20 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great work!

@jijoongmoon jijoongmoon merged commit 27448b6 into nnstreamer:main Nov 7, 2024
45 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

8 participants