Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve mul!, AddedOperator, and update_coefficients! to remove memory allocations #249

Open
wants to merge 12 commits into
base: master
Choose a base branch
from

Conversation

albertomercurio
Copy link
Contributor

This PR fixes #246 and the remaining allocations of #247.

This is related to this issue in the SparseArrays.jl package.

@albertomercurio
Copy link
Contributor Author

Ok, I'm done.

@albertomercurio
Copy link
Contributor Author

I've also noticed that the time-dependent case is not still fixed. As an example

T = ComplexF64
N = 500
M = 10
# A1 = MatrixOperator(rand(T, N, N))
# A2 = MatrixOperator(rand(T, N, N))
A1 = MatrixOperator(sprand(T, N, N, 5 / N))
A2 = MatrixOperator(sprand(T, N, N, 5 / N))
A3 = MatrixOperator(sprand(T, N, N, 5 / N))

coeff1(a, u, p, t) = sin(p.ω * t)
coeff2(a, u, p, t) = cos(p.ω * t)
coeff3(a, u, p, t) = sin(p.ω * t) * cos(p.ω * t)

c1 = ScalarOperator(rand(T), coeff1)
c2 = ScalarOperator(rand(T), coeff2)
c3 = ScalarOperator(rand(T), coeff3)

H = c1 * A1 + c2 * A2 + c3 * A3

u = rand(T, N);
du = similar(u);
p ==0.1,);
t = 0.1;

@benchmark $H($du, $u, $p, $t)
BenchmarkTools.Trial: 10000 samples with 7 evaluations.
 Range (min … max):  4.964 μs …  10.530 μs  ┊ GC (min … max): 0.00% … 0.00%
 Time  (median):     5.045 μs               ┊ GC (median):    0.00%
 Time  (mean ± σ):   5.079 μs ± 211.087 ns  ┊ GC (mean ± σ):  0.00% ± 0.00%

       ▂▆▇██▇▆▅▃▂▁           ▁▁▁ ▁       ▁▁▁▁                 ▂
  ▄▁▄▃▁█████████████▆▆▅▆▆▆▆▆█████████▆▇▇██████▇▅▇▆▅▆▄▄▄▅▅▃▄▁▄ █
  4.96 μs      Histogram: log(frequency) by time      5.48 μs <

 Memory estimate: 752 bytes, allocs estimate: 8.

And the number of the allocations scales as the number of operators in the sum. I actually don't understand what is the problem.

Anyhow, the time-independent code in #246 is fixed.

@albertomercurio
Copy link
Contributor Author

Ok, also the time-dependent case should be fixed. At least for the case I studied. I basically followed the discussion on this thread, where they recommended to use generated functions when dealing with tuples of different types.

Indeed, the memory allocations are absent now.

@albertomercurio albertomercurio changed the title Improve mul! to remove memory allocations Improve mul!, AddedOperator, and update_coefficients! to remove memory allocations Oct 15, 2024
iszero(L.λ) && return lmul!(false, v)
a = convert(Number, L.λ)
mul!(v, L.L, u, a, false)
end

function LinearAlgebra.mul!(v::AbstractVecOrMat,
@inline function LinearAlgebra.mul!(v::AbstractVecOrMat,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why is this needed?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is related to this issue in SparseArrays.jl. We currently need it to avoid extra allocations.

Comment on lines -459 to +511
for op in L.ops
iszero(op) && continue
mul!(v, op, u, α, true)
ops_types = L.parameters[2].parameters
N = length(ops_types)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is this actually required?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The AddedOperator contains a Tuple of different objects. Performing a simple for loop would lead to runtime dispatch and extra allocations. Following this thread they recommended to use @generated functions. In this way, we directly work with the single types of the Tuple, and we unroll the for loop.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A recursive implmeentation is probably cleaner and would get more compilation reuse?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How exactly?

@ChrisRackauckas
Copy link
Member

Needs tests.

@albertomercurio
Copy link
Contributor Author

I didn't add any new functionality, so the current tests should be fine. This was just to remove extra allocations and improve performance. But if you want to add some test, do you have something specific in mind?

@ChrisRackauckas
Copy link
Member

A test for zero allocations so it doesn't regress

@albertomercurio
Copy link
Contributor Author

Ok, I added the tests. It seems that @allocations returns 1 instead of 0 like with BenchmarkTools.jl. @allocations returns 0 outside of the test.

Anyways, the number of allocations of the current master branch is of the order of hundreds.

@ChrisRackauckas
Copy link
Member

It's allocating a return because of the global scope IIUC. Wrap it in a function that then returns nothing.

@albertomercurio
Copy link
Contributor Author

Still the same. Outside the tests it returns 0 allocations with both @allocations and BenchmarkTools.jl. Inside the test it returns 1.

@ChrisRackauckas
Copy link
Member

@oscardssmith rfc

@oscardssmith
Copy link

This happens because apply_op! is a a dispatch so it's allocating for the method call. It does seem to me that this might be a bug in the @allocations macro, but I'm not entirely sure. The easiest way to get around it is to something like

test_apply_noalloc(H, du, u, p, t) = @test @allocations apply_op!(H, du, u, p, t) ==0

since that will introduce a function barrier.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

AddedOperator of Sparse matrices allocates memory
3 participants