Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add scale factor for each channel #350

Merged
merged 5 commits into from
Sep 30, 2024

Conversation

xiaoling-yi
Copy link
Collaborator

@xiaoling-yi xiaoling-yi commented Sep 27, 2024

In this PR, we add scale factors for each channel, which is 8 for the GeMMX. Specifically, we add:

  • supporting logic in the SIMD, mainly added more CSRs and the control signals to assign the correct CSR to correct PE
  • relevant changes in hw cfg and gemmx wrapper with regard to the increased number of CSRs
  • relevant software modification, mainly added more CSRs
  • relevant test and golden data generation. The software test performs channel-wise rescale now
  • some automatic scala file formatting

Note: This is only a functional fix of the SIMD. The critical path issue will be addressed later.
A bug fix of pipe parameter in GeMMX!

@xiaoling-yi xiaoling-yi force-pushed the xyi/add-scale-factor-for-each-channel branch from b0e1f07 to d8486d6 Compare September 27, 2024 10:10
Copy link

@rgantonio rgantonio left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I may have forgotten but is this the feature that @jorendumoulin needs?

Comment on lines +191 to +204

// set the shared bitpacked shift
csrw_ss(SIMD_SHARED_BITPACKED_SHIFT0, shared_bitpacked_shift0);
csrw_ss(SIMD_SHARED_BITPACKED_SHIFT1, shared_bitpacked_shift1);

// set the shared multipliers
csrw_ss(SIMD_SHARED_MULTIPLIER0, shared_multiplier0);
csrw_ss(SIMD_SHARED_MULTIPLIER1, shared_multiplier1);
csrw_ss(SIMD_SHARED_MULTIPLIER2, shared_multiplier2);
csrw_ss(SIMD_SHARED_MULTIPLIER3, shared_multiplier3);
csrw_ss(SIMD_SHARED_MULTIPLIER4, shared_multiplier4);
csrw_ss(SIMD_SHARED_MULTIPLIER5, shared_multiplier5);
csrw_ss(SIMD_SHARED_MULTIPLIER6, shared_multiplier6);
csrw_ss(SIMD_SHARED_MULTIPLIER7, shared_multiplier7);

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LOL now there are much more CSRs to configure. As more features are added, we are getting close to NVDLA registers hahahaha.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You are right. There are a lot of control registers needed especially if we want to support per channel quantization. We can retrain the models with per tensor quantization but there might be some accuracy lost.

@IveanEx
Copy link

IveanEx commented Sep 27, 2024

Do you want to try implement it in systematic way, which is using DataPathExtension? ☺️

Copy link

@jorendumoulin jorendumoulin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Functionally this looks good!

Just triple check that there are different scaling factors for every K dimension - but this looks allright

Just one question: if there are more than 8 output channels, there are more than 8 scaling factors, and we would need to reprogram them during the operation.

I am planning to do it like this, do you think it is possible? (if i am correct the output dimension of conv is at N?)

configure_streamer(M=2,N=2,K=2)
start_streamer()
configure_gemm(M=2,N=1,K=2)
start_gemm()
wait_gemm()
configure_gemm(M=2,N=1,K=2) -> only new scaling factors here, other params stay the same
start_gemm()
wait_gemm()
wait_streamer()

@xiaoling-yi
Copy link
Collaborator Author

xiaoling-yi commented Sep 27, 2024

Just one question: if there are more than 8 output channels, there are more than 8 scaling factors, and we would need to reprogram them during the operation.

No, I don't think this is possible. You can't program the CSRs to the accelerator when the accelerator is doing current computation (busy). With the help of CsrManager, you can program the CSRs for the next operation (multi-cycles) and store them inside the register of the CsrManager. These CSRs can be sent to the accelerator when it is idle (one cycle).

We can only set N=1 then. Your proposal can work I think.

Copy link

@jorendumoulin jorendumoulin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Allright thank you!
Apart from @IveanEx 's suggestion, which I think would be really nice, this looks good to me

Comment on lines 23 to 29
int32_t gen_csr0_config(uint8_t input_zp_i, uint8_t output_zp_i,
uint8_t shift_i, uint8_t max_int_i) {
uint8_t max_int_i, uint8_t min_int_i) {
// encode the configuration into a single 32-bit integer
return ((int32_t)max_int_i << 24) | ((int32_t)shift_i << 16) |
return ((int32_t)min_int_i << 24) | ((int32_t)max_int_i << 16) |
((int32_t)output_zp_i << 8) | (int32_t)input_zp_i;
}

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

does this work for negative integers? I think the sign bits will then overflow into each other

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It works for negative integers as the test indicates.
The min_int_i is always -127, the max_int_i is 128.
input_zp_i and output_zp_i can be negative integers.

@xiaoling-yi
Copy link
Collaborator Author

Do you want to try implement it in systematic way, which is using DataPathExtension? ☺️

Good idea. But it needs much more extra work. We can leave it to the next big refactor.

@xiaoling-yi
Copy link
Collaborator Author

As the failing test only related to #353, I will merge this PR now!

@xiaoling-yi xiaoling-yi merged commit 4a283cb into main Sep 30, 2024
22 of 23 checks passed
@xiaoling-yi xiaoling-yi deleted the xyi/add-scale-factor-for-each-channel branch September 30, 2024 11:04
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants