Add scale factor for each channel #350

xiaoling-yi · 2024-09-27T09:56:26Z

In this PR, we add scale factors for each channel, which is 8 for the GeMMX. Specifically, we add:

supporting logic in the SIMD, mainly added more CSRs and the control signals to assign the correct CSR to correct PE
relevant changes in hw cfg and gemmx wrapper with regard to the increased number of CSRs
relevant software modification, mainly added more CSRs
relevant test and golden data generation. The software test performs channel-wise rescale now
some automatic scala file formatting

Note: This is only a functional fix of the SIMD. The critical path issue will be addressed later.
A bug fix of pipe parameter in GeMMX!

rgantonio

I may have forgotten but is this the feature that @jorendumoulin needs?

rgantonio · 2024-09-27T10:17:21Z

target/snitch_cluster/sw/snax/gemmx/src/snax-gemmx-lib.c

+
+    // set the shared bitpacked shift
+    csrw_ss(SIMD_SHARED_BITPACKED_SHIFT0, shared_bitpacked_shift0);
+    csrw_ss(SIMD_SHARED_BITPACKED_SHIFT1, shared_bitpacked_shift1);
+
+    // set the shared multipliers
+    csrw_ss(SIMD_SHARED_MULTIPLIER0, shared_multiplier0);
+    csrw_ss(SIMD_SHARED_MULTIPLIER1, shared_multiplier1);
+    csrw_ss(SIMD_SHARED_MULTIPLIER2, shared_multiplier2);
+    csrw_ss(SIMD_SHARED_MULTIPLIER3, shared_multiplier3);
+    csrw_ss(SIMD_SHARED_MULTIPLIER4, shared_multiplier4);
+    csrw_ss(SIMD_SHARED_MULTIPLIER5, shared_multiplier5);
+    csrw_ss(SIMD_SHARED_MULTIPLIER6, shared_multiplier6);
+    csrw_ss(SIMD_SHARED_MULTIPLIER7, shared_multiplier7);


LOL now there are much more CSRs to configure. As more features are added, we are getting close to NVDLA registers hahahaha.

You are right. There are a lot of control registers needed especially if we want to support per channel quantization. We can retrain the models with per tensor quantization but there might be some accuracy lost.

IveanEx · 2024-09-27T13:09:38Z

Do you want to try implement it in systematic way, which is using DataPathExtension? ☺️

jorendumoulin

Functionally this looks good!

Just triple check that there are different scaling factors for every K dimension - but this looks allright

Just one question: if there are more than 8 output channels, there are more than 8 scaling factors, and we would need to reprogram them during the operation.

I am planning to do it like this, do you think it is possible? (if i am correct the output dimension of conv is at N?)

configure_streamer(M=2,N=2,K=2)
start_streamer()
configure_gemm(M=2,N=1,K=2)
start_gemm()
wait_gemm()
configure_gemm(M=2,N=1,K=2) -> only new scaling factors here, other params stay the same
start_gemm()
wait_gemm()
wait_streamer()

xiaoling-yi · 2024-09-27T18:38:01Z

Just one question: if there are more than 8 output channels, there are more than 8 scaling factors, and we would need to reprogram them during the operation.

No, I don't think this is possible. You can't program the CSRs to the accelerator when the accelerator is doing current computation (busy). With the help of CsrManager, you can program the CSRs for the next operation (multi-cycles) and store them inside the register of the CsrManager. These CSRs can be sent to the accelerator when it is idle (one cycle).

We can only set N=1 then. Your proposal can work I think.

jorendumoulin

Allright thank you!
Apart from @IveanEx 's suggestion, which I think would be really nice, this looks good to me

jorendumoulin · 2024-09-30T09:52:30Z

target/snitch_cluster/sw/snax/gemmx/src/snax-gemmx-lib.c

 int32_t gen_csr0_config(uint8_t input_zp_i, uint8_t output_zp_i,
-                        uint8_t shift_i, uint8_t max_int_i) {
+                        uint8_t max_int_i, uint8_t min_int_i) {
    // encode the configuration into a single 32-bit integer
-    return ((int32_t)max_int_i << 24) | ((int32_t)shift_i << 16) |
+    return ((int32_t)min_int_i << 24) | ((int32_t)max_int_i << 16) |
           ((int32_t)output_zp_i << 8) | (int32_t)input_zp_i;
 }



does this work for negative integers? I think the sign bits will then overflow into each other

It works for negative integers as the test indicates.
The min_int_i is always -127, the max_int_i is 128.
input_zp_i and output_zp_i can be negative integers.

xiaoling-yi · 2024-09-30T10:57:09Z

Do you want to try implement it in systematic way, which is using DataPathExtension? ☺️

Good idea. But it needs much more extra work. We can leave it to the next big refactor.

xiaoling-yi · 2024-09-30T10:58:24Z

As the failing test only related to #353, I will merge this PR now!

xiaoling-yi requested review from rgantonio, IveanEx and jorendumoulin September 27, 2024 10:09

xiaoling-yi added 3 commits September 27, 2024 12:10

add scale factor for each channel

156d54f

fix ctrl_csr_set_num calculation

d80ae1a

py fmt

d8486d6

xiaoling-yi force-pushed the xyi/add-scale-factor-for-each-channel branch from b0e1f07 to d8486d6 Compare September 27, 2024 10:10

rgantonio reviewed Sep 27, 2024

View reviewed changes

jorendumoulin reviewed Sep 27, 2024

View reviewed changes

xiaoling-yi added 2 commits September 27, 2024 20:42

update csr num in gemmx-xdma

36f7a1e

update gemmx-xdma cfg to latest

65c45c1

jorendumoulin approved these changes Sep 30, 2024

View reviewed changes

xiaoling-yi merged commit 4a283cb into main Sep 30, 2024
22 of 23 checks passed

xiaoling-yi deleted the xyi/add-scale-factor-for-each-channel branch September 30, 2024 11:04

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add scale factor for each channel #350

Add scale factor for each channel #350

xiaoling-yi commented Sep 27, 2024 •

edited

Loading

rgantonio left a comment

rgantonio Sep 27, 2024

xiaoling-yi Sep 27, 2024

IveanEx commented Sep 27, 2024

jorendumoulin left a comment •

edited

Loading

xiaoling-yi commented Sep 27, 2024 •

edited

Loading

jorendumoulin left a comment

jorendumoulin Sep 30, 2024

xiaoling-yi Sep 30, 2024

xiaoling-yi commented Sep 30, 2024

xiaoling-yi commented Sep 30, 2024

Add scale factor for each channel #350

Add scale factor for each channel #350

Conversation

xiaoling-yi commented Sep 27, 2024 • edited Loading

rgantonio left a comment

Choose a reason for hiding this comment

rgantonio Sep 27, 2024

Choose a reason for hiding this comment

xiaoling-yi Sep 27, 2024

Choose a reason for hiding this comment

IveanEx commented Sep 27, 2024

jorendumoulin left a comment • edited Loading

Choose a reason for hiding this comment

xiaoling-yi commented Sep 27, 2024 • edited Loading

jorendumoulin left a comment

Choose a reason for hiding this comment

jorendumoulin Sep 30, 2024

Choose a reason for hiding this comment

xiaoling-yi Sep 30, 2024

Choose a reason for hiding this comment

xiaoling-yi commented Sep 30, 2024

xiaoling-yi commented Sep 30, 2024

xiaoling-yi commented Sep 27, 2024 •

edited

Loading

jorendumoulin left a comment •

edited

Loading

xiaoling-yi commented Sep 27, 2024 •

edited

Loading