-
-
Notifications
You must be signed in to change notification settings - Fork 10
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Improve mul!
, AddedOperator
, and update_coefficients!
to remove memory allocations
#249
base: master
Are you sure you want to change the base?
Conversation
Ok, I'm done. |
I've also noticed that the time-dependent case is not still fixed. As an example T = ComplexF64
N = 500
M = 10
# A1 = MatrixOperator(rand(T, N, N))
# A2 = MatrixOperator(rand(T, N, N))
A1 = MatrixOperator(sprand(T, N, N, 5 / N))
A2 = MatrixOperator(sprand(T, N, N, 5 / N))
A3 = MatrixOperator(sprand(T, N, N, 5 / N))
coeff1(a, u, p, t) = sin(p.ω * t)
coeff2(a, u, p, t) = cos(p.ω * t)
coeff3(a, u, p, t) = sin(p.ω * t) * cos(p.ω * t)
c1 = ScalarOperator(rand(T), coeff1)
c2 = ScalarOperator(rand(T), coeff2)
c3 = ScalarOperator(rand(T), coeff3)
H = c1 * A1 + c2 * A2 + c3 * A3
u = rand(T, N);
du = similar(u);
p = (ω=0.1,);
t = 0.1;
@benchmark $H($du, $u, $p, $t)
And the number of the allocations scales as the number of operators in the sum. I actually don't understand what is the problem. Anyhow, the time-independent code in #246 is fixed. |
Ok, also the time-dependent case should be fixed. At least for the case I studied. I basically followed the discussion on this thread, where they recommended to use generated functions when dealing with tuples of different types. Indeed, the memory allocations are absent now. |
mul!
, AddedOperator
, and update_coefficients!
to remove memory allocations
iszero(L.λ) && return lmul!(false, v) | ||
a = convert(Number, L.λ) | ||
mul!(v, L.L, u, a, false) | ||
end | ||
|
||
function LinearAlgebra.mul!(v::AbstractVecOrMat, | ||
@inline function LinearAlgebra.mul!(v::AbstractVecOrMat, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why is this needed?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is related to this issue in SparseArrays.jl. We currently need it to avoid extra allocations.
for op in L.ops | ||
iszero(op) && continue | ||
mul!(v, op, u, α, true) | ||
ops_types = L.parameters[2].parameters | ||
N = length(ops_types) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is this actually required?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The AddedOperator
contains a Tuple
of different objects. Performing a simple for
loop would lead to runtime dispatch and extra allocations. Following this thread they recommended to use @generated
functions. In this way, we directly work with the single types of the Tuple
, and we unroll the for loop.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A recursive implmeentation is probably cleaner and would get more compilation reuse?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How exactly?
Needs tests. |
I didn't add any new functionality, so the current tests should be fine. This was just to remove extra allocations and improve performance. But if you want to add some test, do you have something specific in mind? |
A test for zero allocations so it doesn't regress |
Ok, I added the tests. It seems that Anyways, the number of allocations of the current master branch is of the order of hundreds. |
It's allocating a return because of the global scope IIUC. Wrap it in a function that then returns |
Still the same. Outside the tests it returns 0 allocations with both |
@oscardssmith rfc |
This happens because
since that will introduce a function barrier. |
This PR fixes #246 and the remaining allocations of #247.
This is related to this issue in the SparseArrays.jl package.