Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Canonicalize flow.tensor.clone to keep it more local to usage #5291

Closed
benvanik opened this issue Apr 2, 2021 · 0 comments · Fixed by #5292
Closed

Canonicalize flow.tensor.clone to keep it more local to usage #5291

benvanik opened this issue Apr 2, 2021 · 0 comments · Fixed by #5292
Assignees
Labels
compiler/dialects Relating to the IREE compiler dialects (flow, hal, vm) performance ⚡ Performance/optimization related work across the compiler and runtime

Comments

@benvanik
Copy link
Collaborator

benvanik commented Apr 2, 2021

Now that we are performing in-place operations there are flow.tensor.clones appearing to preserve correctness. In cases of wide fan-out these cause some pathological behavior:

      %22 = flow.dispatch @serving_default_dispatch_21::@serving_default_dispatch_21[%c384, %c4, %c1]() : () -> tensor<1x4x384xf32>
      %23 = flow.tensor.clone %22 : tensor<1x4x384xf32>
      %24 = flow.tensor.clone %23 : tensor<1x4x384xf32>
      %25 = flow.tensor.clone %24 : tensor<1x4x384xf32>
      %26 = flow.tensor.clone %25 : tensor<1x4x384xf32>
      %27 = flow.tensor.clone %26 : tensor<1x4x384xf32>
      %28 = flow.tensor.clone %27 : tensor<1x4x384xf32>
      %29 = flow.tensor.clone %28 : tensor<1x4x384xf32>
      %30 = flow.tensor.clone %29 : tensor<1x4x384xf32>
      %31 = flow.tensor.clone %30 : tensor<1x4x384xf32>
      %32 = flow.tensor.clone %31 : tensor<1x4x384xf32>
      %33 = flow.tensor.clone %32 : tensor<1x4x384xf32>
      %34 = flow.tensor.clone %33 : tensor<1x4x384xf32>
      %35 = flow.tensor.clone %34 : tensor<1x4x384xf32>
      %36 = flow.tensor.clone %35 : tensor<1x4x384xf32>
      %37 = flow.tensor.clone %36 : tensor<1x4x384xf32>
      %38 = flow.tensor.clone %37 : tensor<1x4x384xf32>
      %39 = flow.tensor.clone %38 : tensor<1x4x384xf32>
      %40 = flow.tensor.clone %39 : tensor<1x4x384xf32>
      %41 = flow.tensor.clone %40 : tensor<1x4x384xf32>
      %42 = flow.tensor.clone %41 : tensor<1x4x384xf32>
      %43 = flow.tensor.clone %42 : tensor<1x4x384xf32>
      %44 = flow.tensor.clone %43 : tensor<1x4x384xf32>
      %45 = flow.tensor.clone %44 : tensor<1x4x384xf32>
      %46 = flow.dispatch @serving_default_dispatch_22::@serving_default_dispatch_22[%c384, %c4, %c1](%21, %22) : (tensor<4x384x384xf32>, tensor<1x4x384xf32>) -> %22
      // uses of %23-%46 throughout the model

Some simple code motion will help here: sinking the clones to immediately prior to their use will shorten the lifetime of these transient tensors.

@benvanik benvanik added compiler/dialects Relating to the IREE compiler dialects (flow, hal, vm) performance ⚡ Performance/optimization related work across the compiler and runtime labels Apr 2, 2021
@benvanik benvanik self-assigned this Apr 2, 2021
benvanik added a commit that referenced this issue Apr 2, 2021
This dramatically reduces the lifetime of the clones.
Fixes #5291.
benvanik added a commit that referenced this issue Apr 2, 2021
This dramatically reduces the lifetime of the clones.
Fixes #5291.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
compiler/dialects Relating to the IREE compiler dialects (flow, hal, vm) performance ⚡ Performance/optimization related work across the compiler and runtime
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant