-
Notifications
You must be signed in to change notification settings - Fork 9
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add scale factor for each channel #350
Conversation
b0e1f07
to
d8486d6
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I may have forgotten but is this the feature that @jorendumoulin needs?
|
||
// set the shared bitpacked shift | ||
csrw_ss(SIMD_SHARED_BITPACKED_SHIFT0, shared_bitpacked_shift0); | ||
csrw_ss(SIMD_SHARED_BITPACKED_SHIFT1, shared_bitpacked_shift1); | ||
|
||
// set the shared multipliers | ||
csrw_ss(SIMD_SHARED_MULTIPLIER0, shared_multiplier0); | ||
csrw_ss(SIMD_SHARED_MULTIPLIER1, shared_multiplier1); | ||
csrw_ss(SIMD_SHARED_MULTIPLIER2, shared_multiplier2); | ||
csrw_ss(SIMD_SHARED_MULTIPLIER3, shared_multiplier3); | ||
csrw_ss(SIMD_SHARED_MULTIPLIER4, shared_multiplier4); | ||
csrw_ss(SIMD_SHARED_MULTIPLIER5, shared_multiplier5); | ||
csrw_ss(SIMD_SHARED_MULTIPLIER6, shared_multiplier6); | ||
csrw_ss(SIMD_SHARED_MULTIPLIER7, shared_multiplier7); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LOL now there are much more CSRs to configure. As more features are added, we are getting close to NVDLA registers hahahaha.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You are right. There are a lot of control registers needed especially if we want to support per channel quantization. We can retrain the models with per tensor quantization but there might be some accuracy lost.
Do you want to try implement it in systematic way, which is using DataPathExtension? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Functionally this looks good!
Just triple check that there are different scaling factors for every K dimension - but this looks allright
Just one question: if there are more than 8 output channels, there are more than 8 scaling factors, and we would need to reprogram them during the operation.
I am planning to do it like this, do you think it is possible? (if i am correct the output dimension of conv is at N?)
configure_streamer(M=2,N=2,K=2)
start_streamer()
configure_gemm(M=2,N=1,K=2)
start_gemm()
wait_gemm()
configure_gemm(M=2,N=1,K=2) -> only new scaling factors here, other params stay the same
start_gemm()
wait_gemm()
wait_streamer()
No, I don't think this is possible. You can't program the CSRs to the accelerator when the accelerator is doing current computation (busy). With the help of CsrManager, you can program the CSRs for the next operation (multi-cycles) and store them inside the register of the CsrManager. These CSRs can be sent to the accelerator when it is idle (one cycle). We can only set N=1 then. Your proposal can work I think. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Allright thank you!
Apart from @IveanEx 's suggestion, which I think would be really nice, this looks good to me
int32_t gen_csr0_config(uint8_t input_zp_i, uint8_t output_zp_i, | ||
uint8_t shift_i, uint8_t max_int_i) { | ||
uint8_t max_int_i, uint8_t min_int_i) { | ||
// encode the configuration into a single 32-bit integer | ||
return ((int32_t)max_int_i << 24) | ((int32_t)shift_i << 16) | | ||
return ((int32_t)min_int_i << 24) | ((int32_t)max_int_i << 16) | | ||
((int32_t)output_zp_i << 8) | (int32_t)input_zp_i; | ||
} | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
does this work for negative integers? I think the sign bits will then overflow into each other
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It works for negative integers as the test indicates.
The min_int_i is always -127, the max_int_i is 128.
input_zp_i and output_zp_i can be negative integers.
Good idea. But it needs much more extra work. We can leave it to the next big refactor. |
As the failing test only related to #353, I will merge this PR now! |
In this PR, we add scale factors for each channel, which is 8 for the GeMMX. Specifically, we add:
Note: This is only a functional fix of the SIMD. The critical path issue will be addressed later.
A bug fix of pipe parameter in GeMMX!