Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Different output between CPU and Metal with the im2col operator #931

Closed
lovemefan opened this issue Aug 26, 2024 · 6 comments
Closed

Different output between CPU and Metal with the im2col operator #931

lovemefan opened this issue Aug 26, 2024 · 6 comments

Comments

@lovemefan
Copy link

In the code of my custom model, during debugging, I noticed that the im2col operator produces different results on Metal compared to the CPU.

my device is M1 pro.

code at https://github.com/lovemefan/SenseVoice.cpp/blob/1dfa60459a0104a68066ddf7c275f4c0a33972a6/sense-voice/csrc/sense-voice-encoder.cc#L267-L269

struct ggml_tensor * im2col = ggml_im2col(ctx0, new_a,
               ggml_reshape_4d(ctx0, b, b->ne[0], 1, b->ne[1], b->ne[2] * b->ne[3]),
               1, 0, padding, 0, 1, 0, false, GGML_TYPE_F16);

here is my callback log of tensor, the im2col's name is node55

cpu.log
metal.log

I would appreciate your help with this. Thank you!

@JohannesGaessler
Copy link
Collaborator

Do the im2col tests in test-backend-ops pass?

@lovemefan
Copy link
Author

lovemefan commented Aug 27, 2024

Do the im2col tests in test-backend-ops pass?

all passed with -DGGML_METAL=OFF

one failed with -DGGML_METAL=ON

96% tests passed, 1 tests failed out of 24

Total Test time (real) =  86.43 sec

The following tests FAILED:
         16 - test-conv-transpose-1d (Subprocess aborted)

@ggerganov
Copy link
Owner

The test-conv-transpose-1d binary fails on Mac because we don't have Metal implementation yet.

@JohannesGaessler was asking for the result from test-backend-ops and they currently pass:

make -j && ./bin//test-backend-ops -o IM2COL
Backend 1/2 (CPU)
  Skipping CPU backend
Backend 2/2 (Metal)
ggml_metal_init: allocating
ggml_metal_init: found device: Apple M1 Pro
ggml_metal_init: picking default device: Apple M1 Pro
ggml_metal_init: using embedded metal library
ggml_metal_init: GPU name:   Apple M1 Pro
ggml_metal_init: GPU family: MTLGPUFamilyApple7  (1007)
ggml_metal_init: GPU family: MTLGPUFamilyCommon3 (3003)
ggml_metal_init: GPU family: MTLGPUFamilyMetal3  (5001)
ggml_metal_init: simdgroup reduction support   = true
ggml_metal_init: simdgroup matrix mul. support = true
ggml_metal_init: hasUnifiedMemory              = true
ggml_metal_init: recommendedMaxWorkingSetSize  = 22906.50 MB
  Backend name: Metal
  IM2COL(type_input=f32,type_kernel=f16,dst_type=f32,ne_input=[10,10,3,1],ne_kernel=[3,3,3,1],s0=1,s1=1,p0=1,p1=1,d0=1,d1=1,is_2D=1): OK
  IM2COL(type_input=f32,type_kernel=f16,dst_type=f16,ne_input=[10,10,3,1],ne_kernel=[3,3,3,1],s0=1,s1=1,p0=1,p1=1,d0=1,d1=1,is_2D=1): OK
  IM2COL(type_input=f32,type_kernel=f16,dst_type=f16,ne_input=[3000,128,1,1],ne_kernel=[3,128,1280,1],s0=1,s1=0,p0=1,p1=0,d0=1,d1=0,is_2D=0): OK
  IM2COL(type_input=f32,type_kernel=f16,dst_type=f32,ne_input=[3000,128,1,1],ne_kernel=[3,128,1280,1],s0=1,s1=0,p0=1,p1=0,d0=1,d1=0,is_2D=0): OK
  1344/1344 tests passed
  Backend Metal: OK

ggml_metal_free: deallocating
2/2 backends passed
OK

@lovemefan Try to add a test to test-backend-ops.cpp that fails and work from there

@lovemefan
Copy link
Author

The script can pass, but I can’t reproduce my bug in add a examples in test-backend-ops. I spent some time troubleshooting, but couldn’t find anything.

Testing 2 backends

Backend 1/2 (CPU)
  Backend name: CPU
  IM2COL(type_input=f32,type_kernel=f16,dst_type=f32,ne_input=[10,10,3,1],ne_kernel=[3,3,3,1],s0=1,s1=1,p0=1,p1=1,d0=1,d1=1,is_2D=1): OK
  IM2COL(type_input=f32,type_kernel=f16,dst_type=f16,ne_input=[10,10,3,1],ne_kernel=[3,3,3,1],s0=1,s1=1,p0=1,p1=1,d0=1,d1=1,is_2D=1): OK
  IM2COL(type_input=f32,type_kernel=f16,dst_type=f16,ne_input=[3000,128,1,1],ne_kernel=[3,128,1280,1],s0=1,s1=0,p0=1,p1=0,d0=1,d1=0,is_2D=0): ggml_backend_register: registered backend CPU
ggml_backend_register: registered backend Metal
OK
  IM2COL(type_input=f32,type_kernel=f16,dst_type=f32,ne_input=[3000,128,1,1],ne_kernel=[3,128,1280,1],s0=1,s1=0,p0=1,p1=0,d0=1,d1=0,is_2D=0): OK
  IM2COL(type_input=f32,type_kernel=f16,dst_type=f32,ne_input=[20,1,2,1],ne_kernel=[11,1,2,1],s0=1,s1=0,p0=5,p1=0,d0=1,d1=0,is_2D=0): OK
  1345/1345 tests passed
  Backend CPU: OK

Backend 2/2 (Metal)
  Skipping
2/2 backends passed
OK

in my code, I noticed the output tensor looks like random data.

node_56 = (f32)     IM2COL(encoder.encoders0.0.self_attn.fsmn_block.weight (reshaped){11, 1, 512, 1}, attention_V (transposed) (cont) (reshaped) (cont){187, 1, 512, 1}}) = {11, 187, 512, 1}
                                     [
                                      [
                                       [      0.1077,       0.0867,       0.0815, ...,       0.0939,       0.0617,       0.0004],
                                       [      0.0509,      -0.0198,       0.0252, ...,       0.0589,       0.0110,      -0.0662],
                                       [     -0.0691,      -0.1348,      -0.1385, ...,      -0.1628,      -0.1450,      -0.1191],
                                       ..., 
                                       [      0.0251,      -0.0317,      -0.0558, ...,      -0.1676,      -0.1620,      -0.0997],
                                       [     -0.1455,      -0.0764,      -0.1159, ...,      -0.0249,       0.0146,       0.0687],
                                       [      0.1323,       0.1453,       0.1234, ...,       0.2963,       0.1976,       0.1235],
                                      ],
                                      [
                                       [      0.3266,       0.3900,       0.3218, ...,       0.0291,       0.0054,      -0.0033],
                                       [      0.0183,       0.0379,       0.1066, ...,       0.2731,       0.2905,       0.3599],
                                       [      0.4786,       0.4105,       0.4773, ...,       0.2667,       0.3345,       0.2860],
                                       ..., 
                                       [      0.2771,       0.2001,       0.2363, ...,       0.1060,       0.1247,       0.0417],
                                       [      0.0584,       0.1006,       0.1617, ...,      -0.0197,      -0.0852,      -0.0765],
                                       [     -0.0800,      -0.0915,      -0.1155, ...,      -0.2073,      -0.2165,      -0.1289],
                                      ],
                                      [
                                       [     -0.1064,      -0.1542,      -0.1333, ...,      -0.0538,       0.0158,       0.0972],
                                       [      0.1904,       0.2528,       0.2121, ...,       0.4930,       0.4395,       0.2486],
                                       [     -0.0395,       0.0909,       0.2529, ...,       0.0461,       0.0395,      -0.0260],
                                       ..., 
                                       [      0.0827,       0.0932,       0.1548, ...,       0.1713,       0.0767,       0.1357],
                                       [      0.1237,       0.1689,       0.2138, ...,       0.1194,       0.1599,       0.2138],
                                       [      0.4326,       0.3982,       0.2848, ...,       0.2005,       0.3298,       0.2315],
                                      ],
                                      ..., 
                                      [
                                       [      0.0000,       0.0000,       0.0000, ...,       0.0000,       0.0000,       0.0000],
                                       [      0.0000,       0.0000,       0.0000, ...,       0.0000,       0.0000,       0.0000],
                                       [      0.0000,       0.0000,       0.0000, ...,       0.0000,       0.0000,       0.0000],
                                       ..., 
                                       [      0.0000,       0.0000,       0.0000, ...,       0.0000,       0.0000,       0.0000],
                                       [      0.0000,       0.0000,       0.0000, ...,       0.0000,       0.0000,       0.0000],
                                       [      0.0000,       0.0000,       0.0000, ...,       0.0000,       0.0000,       0.0000],
                                      ],
                                      [
                                       [      0.0000,       0.0000,       0.0000, ...,       0.0000,       0.0000,       0.0000],
                                       [      0.0000,       0.0000,       0.0000, ...,       0.0000,       0.0000,       0.0000],
                                       [      0.0000,       0.0000,       0.0000, ...,       0.0000,       0.0000,       0.0000],
                                       ..., 
                                       [      0.0000,       0.0000,       0.0000, ...,       0.0000,       0.0000,       0.0000],
                                       [      0.0000,       0.0000,       0.0000, ...,       0.0000,       0.0000,       0.0000],
                                       [      0.0000,       0.0000,       0.0000, ...,       0.0000,       0.0000,       0.0000],
                                      ],
                                      [
                                       [      0.0000,       0.0000,       0.0000, ...,       0.0000,       0.0000,       0.0000],
                                       [      0.0000,       0.0000,       0.0000, ...,       0.0000,       0.0000,       0.0000],
                                       [      0.0000,       0.0000,       0.0000, ...,       0.0000,       0.0000,       0.0000],
                                       ..., 
                                       [      0.0000,       0.0000,       0.0000, ...,       0.0000,       0.0000,       0.0000],
                                       [      0.0000,       0.0000,       0.0000, ...,       0.0000,       0.0000,       0.0000],
                                       [      0.0000,       0.0000,       0.0000, ...,       0.0000,       0.0000,       0.0000],
                                      ],
                                     ]
                                     sum = 10.121820

when set dst type GGML_TYPE_F16 , NaN appear.

ggml_debug:                  node_56 = (f16)     IM2COL(encoder.encoders0.0.self_attn.fsmn_block.weight (reshaped){11, 1, 512, 1}, attention_V (transposed) (cont) (reshaped) (cont){187, 1, 512, 1}}) = {11, 187, 512, 1}
                                     [
                                      [
                                       [     -0.0001,       1.4648,      -0.0001, ...,   -1530.0000,       1.5137,   57856.0000],
                                       [      1.5762,   39104.0000,       1.5830, ...,       1.3711,     -40.0000,       0.7363],
                                       [     -0.0138,       1.3281,       0.0001, ...,       0.0350,       1.4951,      -1.8301],
                                       ..., 
                                       [      0.0001,       1.5469,      -6.4375, ...,     -33.4688,       1.5557,       0.5938],
                                       [      0.5054,      -0.0015,       1.5127, ...,       1.4238,      -0.0002,       1.4258],
                                       [     -5.2070,       1.5615,       8.8516, ...,       0.4473,      -1.3477,       0.6865],
                                      ],
                                      [
                                       [     -1.4102,      -2.2832,      -1.4033, ...,      -1.2461,   45632.0000,      -1.5615],
                                       [     -0.0000,      -1.4502,       0.0019, ...,   16544.0000,      -1.5928,   32320.0000],
                                       [     -1.5186,       0.0350,      -1.5273, ...,       1.3389,   -3242.0000,       1.2646],
                                       ..., 
                                       [     -1.4404,      14.8672,      -1.2959, ...,       1.1074,      -0.0106,       1.3867],
                                       [         nan,       1.5068,     -49.8125, ...,       0.0016,      -1.4277,      -8.9766],
                                       [      1.4160,      -0.0000,       1.6074, ...,       1.5723,     -62.8750,       1.4961],
                                      ],
                                      [
                                       [      0.4175,       1.6631,      -0.2144, ...,    -121.3750,       1.4053,      -0.4783],
                                       [      1.5615,       0.0000,       1.5938, ...,       0.9614,       0.0028,      -0.9175],
                                       [         nan,       1.1455,       0.0018, ...,    7648.0000,      -1.0283,      -0.6484],
                                       ..., 
                                       [     -0.1702,      -1.4219,      56.3750, ...,       0.0184,      -1.4717,      -9.1172],
                                       [     -1.5010,   22800.0000,      -1.5039, ...,      -1.5908,     -11.7031,      -1.6396],
                                       [     -0.1598,      -1.4199,      -0.0002, ...,    4010.0000,       1.6562,    1384.0000],
                                      ],
                                      ..., 
                                      [
                                       [      0.0000,       0.0000,       0.0000, ...,       0.0000,       0.0000,       0.0000],
                                       [      0.0000,       0.0000,       0.0000, ...,       0.0000,       0.0000,       0.0000],
                                       [      0.0000,       0.0000,       0.0000, ...,       0.0000,       0.0000,       0.0000],
                                       ..., 
                                       [      0.0000,       0.0000,       0.0000, ...,       0.0000,       0.0000,       0.0000],
                                       [      0.0000,       0.0000,       0.0000, ...,       0.0000,       0.0000,       0.0000],
                                       [      0.0000,       0.0000,       0.0000, ...,       0.0000,       0.0000,       0.0000],
                                      ],
                                      [
                                       [      0.0000,       0.0000,       0.0000, ...,       0.0000,       0.0000,       0.0000],
                                       [      0.0000,       0.0000,       0.0000, ...,       0.0000,       0.0000,       0.0000],
                                       [      0.0000,       0.0000,       0.0000, ...,       0.0000,       0.0000,       0.0000],
                                       ..., 
                                       [      0.0000,       0.0000,       0.0000, ...,       0.0000,       0.0000,       0.0000],
                                       [      0.0000,       0.0000,       0.0000, ...,       0.0000,       0.0000,       0.0000],
                                       [      0.0000,       0.0000,       0.0000, ...,       0.0000,       0.0000,       0.0000],
                                      ],
                                      [
                                       [      0.0000,       0.0000,       0.0000, ...,       0.0000,       0.0000,       0.0000],
                                       [      0.0000,       0.0000,       0.0000, ...,       0.0000,       0.0000,       0.0000],
                                       [      0.0000,       0.0000,       0.0000, ...,       0.0000,       0.0000,       0.0000],
                                       ..., 
                                       [      0.0000,       0.0000,       0.0000, ...,       0.0000,       0.0000,       0.0000],
                                       [      0.0000,       0.0000,       0.0000, ...,       0.0000,       0.0000,       0.0000],
                                       [      0.0000,       0.0000,       0.0000, ...,       0.0000,       0.0000,       0.0000],
                                      ],
                                     ]
                                     sum = nan

what can I do to provide more information next?

@lovemefan
Copy link
Author

This bug is the same as #991, but occurring with Metal

@lovemefan
Copy link
Author

fixed in (llama/9943)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants
@ggerganov @JohannesGaessler @lovemefan and others