-
Notifications
You must be signed in to change notification settings - Fork 74
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Wait for #2615] Enable Mixed Precision Training in NNTrainer @open sesame 11/09 15:18 #2663
Conversation
📝 TAOS-CI Version: 1.5.20200925. Thank you for submitting PR #2663. Please a submit 1commit/1PR (one commit per one PR) policy to get comments quickly from reviewers. Your PR must pass all verificiation processes of cibot before starting a review process from reviewers. If you are new member to join this project, please read manuals in documentation folder and wiki page. In order to monitor a progress status of your PR in more detail, visit http://ci.nnstreamer.ai/. |
cibot: @jijoongmoon, A builder checker could not be completed because one of the checkers is not completed. In order to find out a reason, please go to http://ci.nnstreamer.ai/nntrainer/ci/repo-workers/pr-checker/2663-202407030959240.31418895721436-8fc0c70f550dfc949b3c4f78ce925213b9ef0a3b/. |
cibot: @jijoongmoon, A builder checker could not be completed because one of the checkers is not completed. In order to find out a reason, please go to http://ci.nnstreamer.ai/nntrainer/ci/repo-workers/pr-checker/2663-202407031051270.26607799530029-f6ad00ff276a56a2016757221092cd4315f19580/. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jijoongmoon, 💯 All CI checkers are successfully verified. Thanks.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jijoongmoon, 💯 All CI checkers are successfully verified. Thanks.
Enable Mixed precision on Pooling 2D Layer - I modified it to properly cast for the case of FP16 so that the mixed precision function can be activated on the existing pooling 2d layer. **Self evaluation:** 1. Build test: [X]Passed [ ]Failed [ ]Skipped 2. Run test: [X]Passed [ ]Failed [ ]Skipped Signed-off-by: Donghak PARK <[email protected]>
In this PR, when we compute the l2norm of gradient tensor, it converts to full precsion and computes the l2norm for gradient clipping. Resolves: **Self evaluation:** 1. Build test: [X]Passed [ ]Failed [ ]Skipped 2. Run test: [X]Passed [ ]Failed [ ]Skipped Signed-off-by: jijoong.moon <[email protected]>
This PR add the mu and var backup tensor ( mu_b, var_b ) to restore the previous moving mean and moving variance for mixed precsion training. Resolves: **Self evaluation:** 1. Build test: [X]Passed [ ]Failed [ ]Skipped 2. Run test: [X]Passed [ ]Failed [ ]Skipped Signed-off-by: jijoong.moon <[email protected]>
In order to resotore previous iteration data, this pr disable randomnization of mask if it need restore previous data. Resolves: **Self evaluation:** 1. Build test: [X]Passed [ ]Failed [ ]Skipped 2. Run test: [X]Passed [ ]Failed [ ]Skipped Signed-off-by: jijoong.moon <[email protected]>
This PR enable the check if it need restore previous data. By doing this, we can remove the NaN or Inf data in Tensor for the mixed precsion training. **Self evaluation:** 1. Build test: [X]Passed [ ]Failed [ ]Skipped 2. Run test: [X]Passed [ ]Failed [ ]Skipped Signed-off-by: jijoong.moon <[email protected]>
We do need to remove the Nan or Inf value in Tensor by call setZero(). However, if we using sscal, then Nan or Inf values are remain still. This PR change the sscal to memset. Resolves: **Self evaluation:** 1. Build test: [X]Passed [ ]Failed [ ]Skipped 2. Run test: [X]Passed [ ]Failed [ ]Skipped Signed-off-by: jijoong.moon <[email protected]>
This pr fixes some bugs when it runs as Mixed Precision Training **Self evaluation:** 1. Build test: [X]Passed [ ]Failed [ ]Skipped 2. Run test: [X]Passed [ ]Failed [ ]Skipped Signed-off-by: jijoong.moon <[email protected]>
Adding is_mixed variable to check if it is mixed precision training. It means that weight type of model is not full precision. **Changes proposed in this PR:** - Added TOC generator for README.md Resolves: **Self evaluation:** 1. Build test: [X]Passed [ ]Failed [ ]Skipped 2. Run test: [X]Passed [ ]Failed [ ]Skipped Signed-off-by: jijoong.moon <[email protected]>
For the mixed precision computation of bn layer, there is bug relate with f32 computation. Also Adam update has bug too. **Self evaluation:** 1. Build test: [X]Passed [ ]Failed [ ]Skipped 2. Run test: [X]Passed [ ]Failed [ ]Skipped Signed-off-by: jijoong.moon <[email protected]>
This PR includes changes in Android.mk to use builddir/android_build_result. In order to use, soft link of android_build_reuslt dir is necessary in upper dir (../) ln -s ../../buildir/android_build_result ../nntrainer Resolves: **Self evaluation:** 1. Build test: [X]Passed [ ]Failed [ ]Skipped 2. Run test: [X]Passed [ ]Failed [ ]Skipped Signed-off-by: jijoong.moon <[email protected]>
This PR includes fixes to use TensorV2 Resolves: **Self evaluation:** 1. Build test: [X]Passed [ ]Failed [ ]Skipped 2. Run test: [X]Passed [ ]Failed [ ]Skipped Signed-off-by: jijoong.moon <[email protected]>
5d723ab
to
ea4dd22
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jijoongmoon, 💯 All CI checkers are successfully verified. Thanks.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it's been a while since this PR is introduced. think it is time to go
for (unsigned int i = 0; i < N; ++i) | ||
Y[i * incY] = X[i * incX]; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No worries, I can just simply implement it again. Let's leave it like this for now then
In this PR
This PR finalizes the mixed precision support in NNTrainer.
It modifies the network grap and layer node, and layer implementations. However, it does not support mixed precision training for all the layers.
It supports,
: input, fully_connected, activation, dropout, multiout, concat, lstm, reshape, premute, conv1d, conv2d, addition, batch normalization
: mse
: adam
and definitely, enabling mixed training to other operations should follow.
Self evaluation:
Signed-off-by: jijoong.moon [email protected]
Commits to be reviewed in this PR
[ Model ] Fix the gradient clipping for the FP16 or Low bit Gradient
In this PR, when we compute the l2norm of gradient tensor, it converts to full precsion and computes the l2norm for gradient clipping.
[ Layer ] Add mu and var backup up tensor.
This PR add the mu and var backup tensor ( mu_b, var_b ) to restore the previous moving mean and moving variance for mixed precsion training.
[ Layer ] prevent randomize when it restore the data
In order to resotore previous iteration data, this pr disable randomnization of mask if it need restore previous data.
[ Context ] add check if it needs restore previous data
This PR enable the check if it need restore previous data. By doing
this, we can remove the NaN or Inf data in Tensor for the mixed
precsion training.
[ Tensor ] remove sscal to set zero.
We do need to remove the Nan or Inf value in Tensor by call setZero(). However, if we using sscal, then Nan or Inf values are remain still. This PR change the sscal to memset.
[ Mixed ] set initialize gradient in layers and bugfixes
This pr fixes some bugs when it runs as Mixed Precision Training
[ Mixed Training ] add is_mixed variable in weight
Adding is_mixed variable to check if it is mixed precision training. It means that weight type of model is not full precision.
[ BUG FIX ] Fix bug for mixed precision
For the mixed precision computation of bn layer, there is bug relate with f32 computation. Also Adam update has bug too.
[TEST] using builddir/android_build_result to build test
This PR includes changes in Android.mk to use
builddir/android_build_result. In order to use, soft link of
android_build_reuslt dir is necessary in upper dir (../)
ln -s ../../buildir/android_build_result ../nntrainer