xnnpack backend fails on convolutions after export.save -> export.load #5265

mads-oestergaard · 2024-09-11T14:42:04Z

🐛 Describe the bug

I would have like to use torch.export.save / .load as a protocol for exporting with executorch (it would enable me to run executorch in a container), but for some reason it produces different results when partitioning with xnnpack backend.

It's simple to reproduce: one just has to .save an ExportedProgram and then load it again directly afterwards, and then the error arises. I did a little digging around, and it seems that the convolution weight op is not a get_attr but a placeholder, but I couldn't get the logic straight for fixing it.

Am I using torch.export.save/.load wrong wrt. executorch, or is this a bug with torch.export / executorch?

import torch.nn as nn
import torch
from torch.export import save, load, export, ExportedProgram
from executorch import exir
from executorch.backends.xnnpack.partition.xnnpack_partitioner import XnnpackPartitioner
from executorch.exir.capture._config import ExecutorchBackendConfig
from executorch.exir.passes.memory_planning_pass import MemoryPlanningPass


class SimpleModel(nn.Module):
    def __init__(self):
        super().__init__()

        self.operation = nn.Conv1d(1, 1, 32)

    def forward(self, inputs: torch.Tensor):
        output = self.operation(inputs)

        return output


model = SimpleModel()
example_args = (torch.zeros(1, 1, 128),)

aten_dialect: ExportedProgram = export(model, example_args)

should_fail = True
if should_fail:
    save(aten_dialect, "tmp.pt2")
    aten_dialect = load("tmp.pt2")

edge_program: exir.EdgeProgramManager = exir.to_edge(aten_dialect).to_backend(
    XnnpackPartitioner()
)
executorch_program: exir.ExecutorchProgramManager = edge_program.to_executorch(
    ExecutorchBackendConfig(
        passes=[],
        memory_planning_pass=MemoryPlanningPass(
            "greedy",
            alloc_graph_input=True,
            alloc_graph_output=True,
        ),
    )
)

# 4. Save the compiled .pte program
with open("simple_model.pte", "wb") as f:
    f.write(executorch_program.buffer)

The error:

INFO:executorch.backends.xnnpack.partition.xnnpack_partitioner:Found 1 subgraphs to be partitioned.
Traceback (most recent call last):
  File "/home/mads/Repos/.executorch/lib/python3.10/site-packages/torch/fx/passes/infra/pass_manager.py", line 271, in __call__
    res = fn(module)
  File "/home/mads/Repos/.executorch/lib/python3.10/site-packages/torch/fx/passes/infra/pass_base.py", line 41, in __call__
    res = self.call(graph_module)
  File "/home/mads/Repos/.executorch/lib/python3.10/site-packages/executorch/backends/xnnpack/passes/conv1d_unsqueeze_pass.py", line 120, in call
    raise AssertionError(
AssertionError: Expected op for convolution weight node to be a get_attr node or a parameter

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/home/mads/Repos/executorch/bug-report.py", line 30, in <module>
    edge_program: exir.EdgeProgramManager = exir.to_edge(aten_dialect).to_backend(
  File "/home/mads/Repos/.executorch/lib/python3.10/site-packages/executorch/exir/program/_program.py", line 1166, in to_backend
    new_edge_programs[name] = to_backend(program, partitioner)
  File "/usr/lib/python3.10/functools.py", line 889, in wrapper
    return dispatch(args[0].__class__)(*args, **kw)
  File "/home/mads/Repos/.executorch/lib/python3.10/site-packages/executorch/exir/backend/backend_api.py", line 384, in _
    tagged_graph_module = _partition_and_lower(
  File "/home/mads/Repos/.executorch/lib/python3.10/site-packages/executorch/exir/backend/backend_api.py", line 299, in _partition_and_lower
    partitioned_module = _partition_and_lower_one_graph_module(
  File "/home/mads/Repos/.executorch/lib/python3.10/site-packages/executorch/exir/backend/backend_api.py", line 230, in _partition_and_lower_one_graph_module
    lowered_submodule = to_backend(
  File "/usr/lib/python3.10/functools.py", line 889, in wrapper
    return dispatch(args[0].__class__)(*args, **kw)
  File "/home/mads/Repos/.executorch/lib/python3.10/site-packages/executorch/exir/backend/backend_api.py", line 114, in _
    preprocess_result: PreprocessResult = cls.preprocess(
  File "/home/mads/Repos/.executorch/lib/python3.10/site-packages/executorch/backends/xnnpack/xnnpack_preprocess.py", line 122, in preprocess
    ep = XNNPACKPassManager(ep, passes=passes).transform()
  File "/home/mads/Repos/.executorch/lib/python3.10/site-packages/executorch/backends/xnnpack/passes/__init__.py", line 86, in transform
    ep = _transform(ep, transform_pass)
  File "/home/mads/Repos/.executorch/lib/python3.10/site-packages/executorch/exir/program/_program.py", line 179, in _transform
    res = pm(self.graph_module)
  File "/home/mads/Repos/.executorch/lib/python3.10/site-packages/torch/fx/passes/infra/pass_manager.py", line 297, in __call__
    raise Exception(msg) from e  # noqa: TRY002
Exception: An error occurred when running the 'Conv1dUnsqueezePass' pass after the following passes: []

Versions

Collecting environment information...
PyTorch version: 2.4.0+cpu
Is debug build: False
CUDA used to build PyTorch: Could not collect
ROCM used to build PyTorch: N/A

OS: Ubuntu 22.04.4 LTS (x86_64)
GCC version: (Ubuntu 11.4.0-1ubuntu1~22.04) 11.4.0
Clang version: Could not collect
CMake version: version 3.30.2
Libc version: glibc-2.35

Python version: 3.10.13 (main, Aug 25 2023, 13:20:03) [GCC 9.4.0] (64-bit runtime)
Python platform: Linux-5.15.0-101-generic-x86_64-with-glibc2.35
Is CUDA available: False
CUDA runtime version: Could not collect
CUDA_MODULE_LOADING set to: N/A
GPU models and configuration:
GPU 0: NVIDIA GeForce RTX 4060 Ti
GPU 1: NVIDIA GeForce RTX 4060 Ti

Nvidia driver version: 545.23.08
cuDNN version: Could not collect
HIP runtime version: N/A
MIOpen runtime version: N/A
Is XNNPACK available: True

CPU:
Architecture: x86_64
CPU op-mode(s): 32-bit, 64-bit
Address sizes: 39 bits physical, 48 bits virtual
Byte Order: Little Endian
CPU(s): 8
On-line CPU(s) list: 0-7
Vendor ID: GenuineIntel
Model name: Intel(R) Core(TM) i7-7700K CPU @ 4.20GHz
CPU family: 6
Model: 158
Thread(s) per core: 2
Core(s) per socket: 4
Socket(s): 1
Stepping: 9
CPU max MHz: 4500.0000
CPU min MHz: 800.0000
BogoMIPS: 8400.00
Flags: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx pdpe1gb rdtscp lm constant_tsc art arch_perfmon pebs bts rep_good nopl xtopology nonstop_tsc cpuid aperfmperf pni pclmulqdq dtes64 monitor ds_cpl est tm2 ssse3 sdbg fma cx16 xtpr pdcm pcid sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand lahf_lm abm 3dnowprefetch cpuid_fault invpcid_single pti ssbd ibrs ibpb stibp fsgsbase tsc_adjust bmi1 avx2 smep bmi2 erms invpcid mpx rdseed adx smap clflushopt intel_pt xsaveopt xsavec xgetbv1 xsaves dtherm ida arat pln pts hwp hwp_notify hwp_act_window hwp_epp md_clear flush_l1d arch_capabilities
L1d cache: 128 KiB (4 instances)
L1i cache: 128 KiB (4 instances)
L2 cache: 1 MiB (4 instances)
L3 cache: 8 MiB (1 instance)
NUMA node(s): 1
NUMA node0 CPU(s): 0-7
Vulnerability Gather data sampling: Mitigation; Microcode
Vulnerability Itlb multihit: KVM: Mitigation: VMX unsupported
Vulnerability L1tf: Mitigation; PTE Inversion
Vulnerability Mds: Mitigation; Clear CPU buffers; SMT vulnerable
Vulnerability Meltdown: Mitigation; PTI
Vulnerability Mmio stale data: Mitigation; Clear CPU buffers; SMT vulnerable
Vulnerability Retbleed: Mitigation; IBRS
Vulnerability Spec rstack overflow: Not affected
Vulnerability Spec store bypass: Mitigation; Speculative Store Bypass disabled via prctl and seccomp
Vulnerability Spectre v1: Mitigation; usercopy/swapgs barriers and __user pointer sanitization
Vulnerability Spectre v2: Mitigation; IBRS, IBPB conditional, STIBP conditional, RSB filling, PBRSB-eIBRS Not affected
Vulnerability Srbds: Mitigation; Microcode
Vulnerability Tsx async abort: Mitigation; TSX disabled

Versions of relevant libraries:
[pip3] executorch==0.3.0a0+7d77d78
[pip3] numpy==2.1.1
[pip3] torch==2.4.0+cpu
[pip3] torchaudio==2.4.0+cpu
[pip3] torchsr==1.0.4
[pip3] torchvision==0.19.0
[conda] Could not collect

The text was updated successfully, but these errors were encountered:

davidlin54 · 2024-09-11T23:46:29Z

cc @digantdesai

JacobSzwejbka · 2024-09-13T16:35:54Z

@zhxchen17 @angelayi

mcr229 · 2024-09-13T16:56:15Z

hmm if this is working before save/load, I wonder if this is a problem with export? What's failing is that we are expecting the one of the arguments of convolution to be a constant weight tensor, but our check to find this is failing. If this is the case then I suspect something in the ExportedProgram's state dict or graph signature gets messed up when calling save/load.

zhxchen17 · 2024-09-13T17:02:15Z

So basically I think it's caused by an incorrect heuristic we put here https://github.com/pytorch/pytorch/blob/main/torch/_export/serde/serialize.py#L1952 which will convert every positional arg w/ default always into a kwargs. Right now I think the best way to fix this is to add a field to indicate whether this arg was previously a positional or not.

I guess we haven't uncovered this for a long time because there's no concept of kwargs in a lot of backends, but for python it makes sense.

mads-oestergaard · 2024-09-17T11:51:47Z

Any ideas for working around this quirk?

JacobSzwejbka added bug Something isn't working module: exir Issues related to Export IR module: xnnpack Issues related to xnnpack delegation labels Sep 13, 2024

JacobSzwejbka assigned digantdesai Sep 13, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

xnnpack backend fails on convolutions after export.save -> export.load #5265

xnnpack backend fails on convolutions after export.save -> export.load #5265

mads-oestergaard commented Sep 11, 2024 •

edited

Loading

davidlin54 commented Sep 11, 2024

JacobSzwejbka commented Sep 13, 2024

mcr229 commented Sep 13, 2024

zhxchen17 commented Sep 13, 2024

mads-oestergaard commented Sep 17, 2024

xnnpack backend fails on convolutions after export.save -> export.load #5265

xnnpack backend fails on convolutions after export.save -> export.load #5265

Comments

mads-oestergaard commented Sep 11, 2024 • edited Loading

🐛 Describe the bug

Versions

davidlin54 commented Sep 11, 2024

JacobSzwejbka commented Sep 13, 2024

mcr229 commented Sep 13, 2024

zhxchen17 commented Sep 13, 2024

mads-oestergaard commented Sep 17, 2024

mads-oestergaard commented Sep 11, 2024 •

edited

Loading