-
Notifications
You must be signed in to change notification settings - Fork 48
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
torch.aten.native_layer_norm to linalg #569
Comments
Reproing with HF_TOKEN=... python run.py --torchmlirbuild /path/to/torch-mlir/build --ireebuild /path/to/iree-build --cachedir ~/.cache/huggingface --tests pytorch/models/deit-small-distilled-patch16-224 -r test-onnx --tolerance .001 .001 --mode onnx --report |
My attempt at a minimal repro example:
For some reason no compile error on this. Succeedes without an issue: |
Repro log with:
|
IR dump after failure (with failing line highlighted) Relevant section:
|
TLDR: Looks like we generate the problematic tensor.expand_shape during a Print-ir-after-all produces a 15MB text file.
But we can grep
|
With the minimal ir, it correctly preserves the <?x198xf32>. See output at: https://gist.github.com/renxida/e4347a1ef027e9bf7ed3487e7d87d577 Command:
|
A search of the error message indicates that the error is generated by This function But this particular check if (dynamicShape) {
if (!ShapedType::isDynamic(collapsedShape[map.index()])) {
return emitError(
"expected dimension " + Twine(map.index()) +
" of collapsed type to be dynamic since one or more of the "
"corresponding dimensions in the expanded type is dynamic");
}
} may not make sense - - i can see a world where we allow collapsing unknown-shape dims. |
https://gist.github.com/renxida/d39d9dd77991836d620737e57161d4a3 --compile-to=input |
previously i thought the problem is introduced by but then after slogging through it a bunch with stanley i noticed a couple of weird things in the ir from right before expand_shape:
|
looks like (1) from the last comment is introduced right after memref-expand:
Full logs at https://gist.github.com/renxida/4e1fdcf2cd2b04a9001462b45096323d |
Got sidetracked from (1) a little bit. There are a lot of linalg generics that seems to turn known shapes into unknown shapes:
Result: // -----// IR Dump After ConvertTorchToLinalg (convert-torch-to-linalg) //----- //
%533 = linalg.generic {indexing_maps = [affine_map<(d0, d1, d2) -> (0, d1, d2)>, affine_map<(d0, d1, d2) -> (d0, d1, d2)>, affine_map<(d0, d1, d2) -> (d0, d1, d2)>], iterator_types = ["parallel", "parallel", "parallel"]} ins(%cast_641, %cast_153 : tensor<1x198x384xf32>, tensor<?x198x384xf32>) outs(%532 : tensor<?x198x384xf32>) {
%855 = linalg.generic {indexing_maps = [affine_map<(d0, d1, d2) -> (0, d1, d2)>, affine_map<(d0, d1, d2) -> (d0, d1, d2)>, affine_map<(d0, d1, d2) -> (d0, d1, d2)>], iterator_types = ["parallel", "parallel", "parallel"]} ins(%cast_1398, %cast_906 : tensor<1x198x384xf32>, tensor<?x198x384xf32>) outs(%854 : tensor<?x198x384xf32>) { there are things like this that come relatively early in the pipeline (right after convert-torch-to-linalg). These make sense because they're broadcasting an elementwise op between a 1x198x384 and a ?x194x384 to produce a ?x198x384 What doesn't make sense is when we have a Need to filter specifically for those. cat printirafterall.mlir | grep -E 'ins(%\d+tensor<1x198x?x.outs.?x198|IR Dump' |
Grepping for specifically one-input linalg.generic that convert known shapes into unknown shapes:
|
this is an issue caused by an incomplete shape inference during canonicalization that only assigned 1 to dim 0 of the input but not the output, while operating on a linalg.generic op that really only reduce-sums the last dim. After ExpandOps and before canonicalization, the IR section looks like this (full file here): %338 = linalg.generic {indexing_maps = [affine_map<(d0, d1, d2) -> (d0, d1, d2)>, affine_map<(d0, d1, d2) -> (d0, d1, 0)>], iterator_types = ["parallel", "parallel", "reduction"]} ins(%cast_173 : tensor<?x198x384xf32>) outs(%337 : tensor<?x?x1xf32>) {
^bb0(%in: f32, %out: f32):
%4260 = arith.addf %in, %out : f32
linalg.yield %4260 : f32
} -> tensor<?x?x1xf32> the canonicalization pass after ExpandOps above fills in the input shapes as follows:
the eventual tensor.expand_shape error is caused by so now the question is, is this linalg generic not supposed to have a 1->? or is the UnitExtentDims pass supposed to be able to be able to handle the 1->? |
Summary so far:
|
Latest status: after updating to the latest torch-mlir and iree, encountering an Unsqueeze error. Either something fixed this, or an Unsqueeze error masked this. |
Next steps to fixing this:
|
https://github.com/nod-ai/e2eshark-reports/blob/main/2024-06-12/onnx_reports/statusreport.md |
We are having following problems with onnx lowering into tensor.expand_shape in these models (beit-base-patch16-224-pt22k-ft22k, deit-small-distilled-patch16-224, vit-base-patch16-224). Repro instructions can be found here::
The text was updated successfully, but these errors were encountered: