-
Notifications
You must be signed in to change notification settings - Fork 611
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Numerical issue with chatglm2.vmfb model #15661
Comments
@jinchen62 can't assign because not in org but this is what we were discussing you engaging on |
Related issue #15665 |
To repeat this error in binary:
|
Further Debug Step:
The original 6.4G chatglm-6b-int4.mlir
Then run After run each dispatch, the NAN issue first come out with module_forward_dispatch_9.mlir
Then run To run the chatglm.vmfb for module_forward_dispatch_9.mlir, change the chatglm.py line 170 with
The output is
In https://gist.github.com/manishghop/529225d5e7e609b679f53fc4272be05c
@hanhanW Could you provide some guidance on what's going on in dispatch_9 and where to fix in IREE? |
Can you help untangle the issue from SHARK? I think we need a simpler repro. The first step could be uploading the MLIR file to somewhere and attach a link to the issue. The next step is that you can pass The input seems to be critical in this issue. So the next step is to generate inputs for the smaller repro. You can follow the tips to get the smaller reproducer. Note that it will print many values to stderr during executaion if we pass Feel free to ping me if you run into any issues. |
@hanhanW Werid, I run the prebuild binary successfully this morning.
The iree-run-module stop here:
|
I am seeing the error at 6a60b64:
It looks like we need to regenerate the mlir file? |
Based on the log, I think we const-eval a NAN and it becomes an input. So the issue could be at the other dispatch.
Are you able to get the dispatch? I think it will show up if you pass |
I try to rerun the chatglm.py with nothing change. It shows the same issue we came across yesterday. Could you download and run it? It should generate the mlir quickly compared to I run it >> download to local system >> upload to google bucket >> you download/upload it again to your vm.
|
I just look the chatglm.py code, the mlir is directly generated and saved by torch_mlir.compile. It shouldn't change in different run. |
Here you go. I also give the cmd I run chatglm_dispatch.mlir
Debug steps with this info:
|
Thank you! I can reproduce the issue starting with the dispatch. #map = affine_map<(d0) -> (d0)>
func.func @main(%0: tensor<32xf16>) -> tensor<32xf16>{
%cst = arith.constant 1.000000e+04 : f16
%cst_0 = arith.constant 0.000000e+00 : f16
%cst_1 = arith.constant 1.000000e+00 : f16
%1 = tensor.empty() : tensor<32xf16>
%2 = linalg.generic {indexing_maps = [#map, #map], iterator_types = ["parallel"]} ins(%0 : tensor<32xf16>) outs(%1 : tensor<32xf16>) {
^bb0(%in: f16, %out: f16):
%3 = math.powf %cst, %in : f16
%4 = arith.cmpf one, %3, %cst_0 : f16
cf.assert %4, "unimplemented: tensor with zero element"
%5 = arith.divf %cst_1, %3 : f16
linalg.yield %5 : f16
} -> tensor<32xf16>
return %2 : tensor<32xf16>
} Compile to vmfb: Run the module: Then I got the output: EXEC @main
result[0]: hal.buffer_view
32xf16=-NAN 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 I am taking a look at the dispatch. |
I think there is a bug in PolynomialApproximation pass. We have wrong approximation for I strip the dispatch to make it only have a single powf op, e.g., #map = affine_map<(d0) -> (d0)>
module {
func.func @main(%arg0: tensor<32xf16>) -> tensor<32xf16> {
%cst = arith.constant 1.000000e+04 : f16
%cst_0 = arith.constant 0.000000e+00 : f16
%cst_1 = arith.constant 1.000000e+00 : f16
%0 = tensor.empty() : tensor<32xf16>
%1 = linalg.generic {indexing_maps = [#map, #map], iterator_types = ["parallel"]} ins(%arg0 : tensor<32xf16>) outs(%0 : tensor<32xf16>) {
^bb0(%in: f16, %out: f16):
%2 = math.powf %cst, %in : f16
linalg.yield %2 : f16
} -> tensor<32xf16>
return %1 : tensor<32xf16>
}
} running with the input returns
If I comment out the pass, we can get reasonable outputs:
The implementation is at https://github.com/llvm/llvm-project/blob/2a9d8caf29ca2b2cf4758db31c64fd20cb5eb3bf/mlir/lib/Dialect/Math/Transforms/ExpandPatterns.cpp#L165-L192 @bviyer @rsuderman can you help review if the approximation is correct? |
I have a workaround for the issue: #15927 We can remove the workaround after fixing the polynomial approximation issue. |
There is a bug in polynomial approximation. It generates `NAN` and `INF` for fp16 types. This is a workaround to get it functional. See #15661 for more details. Also rework on the maximumf test. The generic op is not a common input because it uses `outs` while there are no reduction loops.
PYTHON TEST FAIL. Detail is here chatglm_fail_1214.txt |
I think you are running into new issues. The mlir file is regenerated, and we can not compile it using IREE main branch. It crashes in |
Downloading the model now. I'll try to repro once it's downloaded. Is there a specific |
chatglm.py should be enough. It would be better to use chatglm.py to repeat the error locally. It will download the model from huggingface and use torch_mlir.compile to generate and save the mlir model in chatglm-6b-int4.mlir. Then use shark_module.save_module to run the iree-compile. If you look at the chatglm_fail_log_1214.txt line 611 there is an equivalent iree-compile cmd there you can use: |
@AmosLewis I am getting this same error even when generating with chatglm.py:
Is there something I need to do other than running the script with ToM SHARK? |
@AmosLewis Can you try generating with a fresh venv on ToM shark if you haven't already? We aren't able to reproduce the error you're hitting, and I want to make sure we have the same environment and versions for everything. |
I have seen this error. You can |
I have listed the venv and iree version info in the comments of chatglm_fail_1214.txt |
@AmosLewis Thanks for pointing me to that info! I was able to reproduce and fix the issue on my side. The quantized matmul reassociation wasn't meant to support f16, but was not failing gracefully. I went ahead and added f16 support with #15964, and I was able to compile the model. Let me know if you still have any issues after picking this. |
Thanks. I will try your patch on my side. Could you also run the vmfb buy this run_chatglm.py on you side? It tries to run the |
With all the previous fix(#15927 and #15964), the compile error fix but the NAN issue still exist.
|
Can you triage the issue like what we've done above, and attach the reproducer like #15661 (comment)? |
Here is what I got chatglm_fail_log_dispatch9_1218_with_max_15964.txt. It still break at the dispatch9 but stuck here for about 40mins at INF this time. I append the repeat step in the comments as well. |
It looks like othe dispatches generate
This should navigate you to the first place that generates NAN/INF. |
I didn't find the any dispatch output INF to 8. I also try to print the annotation here 1218_chatglm_forward9-dispatch-tensors-annotation.mlir. Then search the jit_eval_8_dispatch_0_generic_4x4_f32xf16xf16. |
I know what's happening... This is happening in const-eval stage, so all the inputs for these dispatches are constant data. It means that the frontend generates invalid constants or IREE reads the weights incorrectly. There are two things in my mind:
|
If the weight is in f64 type, and we can't represent it using f32 type, it could become
If the original weight is invalid, the bug is in the model itself. |
I just elide the input
|
I have to go. One other thing we can try is adding (we should also rename the flag -- I will take a look tomorrow) |
Hello, @hanhanW |
…rg#15927) There is a bug in polynomial approximation. It generates `NAN` and `INF` for fp16 types. This is a workaround to get it functional. See iree-org#15661 for more details. Also rework on the maximumf test. The generic op is not a common input because it uses `outs` while there are no reduction loops.
update: We can run the model without NaN on cascadelake in a clean build. Perhaps it can only be reproduced on haswell CPU. I'm setting up an env on @AmosLewis VM, and see if I can reproduce the issue. |
I am able to produce reasonable output even on the same VM, if you don't use My experiments show that it is the root cause about NaN. It produces NaN only if I added the flag. I don't know why it is added, but can we exclude the flag for now? |
It looks like we are adding it here in shark https://github.com/nod-ai/SHARK/blob/788cc9157c942a4c6f73e3a85f16b14c9ce4d4d5/shark/iree_utils/compile_utils.py#L46. @dan-garvey @monorimet Can help disable it in shark. |
Yeah, we don't want to be adding this flag for anything other than llama2 on CPU. It is needed for llama2 performance, but it is still experimental. |
Use shark with this commit nod-ai/SHARK-Studio#2047 the NAN issuue should be fix this issue. Could you try @manishghop?
|
related issue |
What happened?
I'm able to compile the pytorch model into mlir & then convert the mlir model into vmfb file:
I used this code for compilation : https://gist.github.com/manishghop/55c741b5734b6f3fb041111a4b9be695
But while running the inference I get NaN error:
I used this code to run the inference : https://gist.github.com/manishghop/529225d5e7e609b679f53fc4272be05c
Steps to reproduce your issue
3.1. set-executionpolicy remotesigned
3.2. Run the setup_venv.ps1 from: https://github.com/nod-ai/SHARK
What component(s) does this issue relate to?
Runtime
Version information
No response
Additional context
No response
The text was updated successfully, but these errors were encountered: