Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Memory allocations of 5-argumets mul! when involving struct #566

Open
albertomercurio opened this issue Oct 13, 2024 · 3 comments · May be fixed by #567
Open

Memory allocations of 5-argumets mul! when involving struct #566

albertomercurio opened this issue Oct 13, 2024 · 3 comments · May be fixed by #567

Comments

@albertomercurio
Copy link
Contributor

Hello,

I show a simple example where I find some memory allocations when performing the 5-arguments mul!. In some cases only when the Sparse matrix A is inside a struct, in other cases also without struct.

julia> using LinearAlgebra

julia> using SparseArrays

julia> using BenchmarkTools

julia> T = Float64;

julia> N = 1000;

julia> A = sprand(T, N, N, 5 / N);

julia> x = rand(T, N);

julia> y = similar(x);

julia> α = rand(T);

julia> β = rand(T);

julia> @benchmark mul!($y, $A, $x, $α, false) # This is ok
BenchmarkTools.Trial: 10000 samples with 9 evaluations.
 Range (min … max):  2.092 μs …  3.589 μs  ┊ GC (min … max): 0.00% … 0.00%
 Time  (median):     2.146 μs              ┊ GC (median):    0.00%
 Time  (mean ± σ):   2.184 μs ± 68.468 ns  ┊ GC (mean ± σ):  0.00% ± 0.00%

           ▁▇█▄                                               
  ▂▂▂▂▂▂▂▂▄████▇▄▂▂▂▂▁▂▂▂▂▂▂▂▂▁▂▂▁▂▂▂▃▄▆██▆▄▃▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂ ▃
  2.09 μs        Histogram: frequency by time        2.35 μs <

 Memory estimate: 0 bytes, allocs estimate: 0.

julia> @benchmark mul!($y, $A, $x, $α, $β) # This is not ok
BenchmarkTools.Trial: 10000 samples with 9 evaluations.
 Range (min … max):  2.164 μs …   5.236 μs  ┊ GC (min … max): 0.00% … 0.00%
 Time  (median):     2.193 μs               ┊ GC (median):    0.00%
 Time  (mean ± σ):   2.261 μs ± 138.081 ns  ┊ GC (mean ± σ):  0.00% ± 0.00%

  ▁▅▇█▇▆▄▂▂▂▁           ▁▁    ▁  ▂▄▆▆▆▄▃                ▁▁    ▂
  ████████████▆▅▄▃▅▄▅▃▆██████▇███████████▆▆▅▅▄▄▃▁▅▄▃▅▅▆█████▆ █
  2.16 μs      Histogram: log(frequency) by time      2.54 μs <

 Memory estimate: 80 bytes, allocs estimate: 2.

Involving a simple constructor

julia> struct my_MatrixOperator{MT}
           A::MT
       end

julia> function LinearAlgebra.mul!(v::AbstractVecOrMat, L::my_MatrixOperator, u::AbstractVecOrMat)
           mul!(v, L.A, u)
       end

julia> function LinearAlgebra.mul!(v::AbstractVecOrMat,
               L::my_MatrixOperator,
               u::AbstractVecOrMat,
               α,
               β)
           mul!(v, L.A, u, α, β)
       end

julia> A_op_my = my_MatrixOperator(A);

julia> @benchmark mul!($y, $A_op_my, $x, $α, false) # This is no longer ok
BenchmarkTools.Trial: 10000 samples with 9 evaluations.
 Range (min … max):  2.553 μs …   5.645 μs  ┊ GC (min … max): 0.00% … 0.00%
 Time  (median):     2.627 μs               ┊ GC (median):    0.00%
 Time  (mean ± σ):   2.653 μs ± 116.667 ns  ┊ GC (mean ± σ):  0.00% ± 0.00%

  ▁▆▇▇▆▃▁▂▅██▇▅▂              ▁ ▂▃▄▄▃▂   ▂▃▃▂            ▁▂▂▂ ▃
  ██████████████▇▄▅▄▃▇████▇▇▇█████████▇▆██████▅▅▄▁▄▁▄▅▆▅▆████ █
  2.55 μs      Histogram: log(frequency) by time         3 μs <

 Memory estimate: 80 bytes, allocs estimate: 2.

julia> @benchmark mul!($y, $A_op_my, $x, $α, $β) # This is not ok
BenchmarkTools.Trial: 10000 samples with 9 evaluations.
 Range (min … max):  2.160 μs …  12.129 μs  ┊ GC (min … max): 0.00% … 0.00%
 Time  (median):     2.190 μs               ┊ GC (median):    0.00%
 Time  (mean ± σ):   2.254 μs ± 176.229 ns  ┊ GC (mean ± σ):  0.00% ± 0.00%

  ▁▅██▆▄▂          ▁▅▆▆▅▃▂▁  ▁▃▄▃▂              ▁▂▁           ▂
  ████████▆▅▅▃▃▄▄▅▇████████▇▇██████▆▆▅▅▅▁▅▅▅▅▅▇█████▆▆▅▅▄▅▄▁▃ █
  2.16 μs      Histogram: log(frequency) by time       2.6 μs <

 Memory estimate: 80 bytes, allocs estimate: 2.

julia> @benchmark mul!($y, $A_op_my, $x) # This is ok
BenchmarkTools.Trial: 10000 samples with 9 evaluations.
 Range (min … max):  2.030 μs …  3.575 μs  ┊ GC (min … max): 0.00% … 0.00%
 Time  (median):     2.053 μs              ┊ GC (median):    0.00%
 Time  (mean ± σ):   2.064 μs ± 54.170 ns  ┊ GC (mean ± σ):  0.00% ± 0.00%

   ▄▆███▇▅▄▁                                        ▁▁       ▂
  ██████████▇▅▄▄▃▁▁▄▄▁▁▄▃▁▄▆▇███▇█▇▇▇▅▆▅▄▄▅▅▄▄▅▄▅▄▄▇███▇▅▄▄▆ █
  2.03 μs      Histogram: log(frequency) by time     2.31 μs <

 Memory estimate: 0 bytes, allocs estimate: 0.

Possible partial fix

I've noticed that, by defining the functions above with the @inline or Base.@constprop :aggressive macros, the mul!($y, $A_op_my, $x, $α, false) case is fixed, but it is still a mystery to me.

@dkarrasch
Copy link
Member

The memory allocation does not scale with the size of the matrix/vector, and given that it depends on the scalar factors alpha and beta, I strongly suspect this is related to the MulAddMul construction. That, however, benefits from aggressive constant propagation or inlining.

Without inlining (current):

julia> @code_typed mul!(y, A, x, α, β)
CodeInfo(
1%1  = Base.eq_float(alpha, 1.0)::Bool
└──       goto #5 if not %1
2%3  = Base.eq_float(beta, 0.0)::Bool
└──       goto #4 if not %3
3%5  = %new(LinearAlgebra.MulAddMul{true, true, Float64, Float64}, alpha, beta)::LinearAlgebra.MulAddMul{true, true, Float64, Float64}
└──       goto #8
4%7  = %new(LinearAlgebra.MulAddMul{true, false, Float64, Float64}, alpha, beta)::LinearAlgebra.MulAddMul{true, false, Float64, Float64}
└──       goto #8
5%9  = Base.eq_float(beta, 0.0)::Bool
└──       goto #7 if not %9
6%11 = %new(LinearAlgebra.MulAddMul{false, true, Float64, Float64}, alpha, beta)::LinearAlgebra.MulAddMul{false, true, Float64, Float64}
└──       goto #8
7%13 = %new(LinearAlgebra.MulAddMul{false, false, Float64, Float64}, alpha, beta)::LinearAlgebra.MulAddMul{false, false, Float64, Float64}
└──       goto #8
8%15 = φ (#3 => %5, #4 => %7, #6 => %11, #7 => %13)::LinearAlgebra.MulAddMul{ais1, bis0, Float64, Float64} where {ais1, bis0}%16 = LinearAlgebra.generic_matvecmul!(y, 'N', A, x, %15)::Vector{Float64}
└──       goto #9
9return %16
) => Vector{Float64}

With inlining:

julia> @code_typed mul!(y, A, x, α, β)
CodeInfo(
1%1 = Base.eq_float(alpha, 1.0)::Bool
└──      goto #5 if not %1
2%3 = Base.eq_float(beta, 0.0)::Bool
└──      goto #4 if not %3
3 ─      goto #8
4 ─      goto #8
5%7 = Base.eq_float(beta, 0.0)::Bool
└──      goto #7 if not %7
6 ─      goto #8
7 ─      goto #8
8 ┄      invoke SparseArrays._spmatmul!(y::Vector{Float64}, A::SparseMatrixCSC{Float64, Int64}, x::Vector{Float64}, alpha::Float64, beta::Float64)::Any
└──      goto #9
9return y
) => Vector{Float64}

You see, it doesn't even mess with the MulAddMul stuff! I think it's okay to inline up until _spmatmul!, because all the previous calls in the call chain, exist only for redirection, so we're not at risk to pile up more and more code.

So, if you have a minute, please submit a PR.

@albertomercurio
Copy link
Contributor Author

So, which functions should I inline? All up to _spmatmul!? So does this fix also the general case mul!(y, A_op_my, x, α, β) or just the mul!(y, A_op_my, x, α, false)? It seems both.

@dkarrasch
Copy link
Member

For what I posted, the generic_matvecmul! method is enough, the others down the call chain may already have an annotation, or are automatically inlined.

@dkarrasch dkarrasch linked a pull request Oct 14, 2024 that will close this issue
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants