Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Reduce allocations for multipling LazyTensor of sparse and dense #80

Merged
merged 17 commits into from
Mar 27, 2023

Conversation

AmitRotem
Copy link
Contributor

Following suggestion from @amilsted regarding qojulia/QuantumOptics.jl#352

shape, strides_j and strides_k in _gemm_puresparse function are implemented as Tuple to reduce allocations.

@codecov
Copy link

codecov bot commented Feb 23, 2023

Codecov Report

Merging #80 (f91c5a0) into master (3ed0e84) will increase coverage by 0.10%.
The diff coverage is 97.29%.

@@            Coverage Diff             @@
##           master      #80      +/-   ##
==========================================
+ Coverage   92.61%   92.71%   +0.10%     
==========================================
  Files          24       24              
  Lines        3089     3104      +15     
==========================================
+ Hits         2861     2878      +17     
+ Misses        228      226       -2     
Impacted Files Coverage Δ
src/operators_lazytensor.jl 95.88% <96.66%> (-0.15%) ⬇️
src/operators.jl 97.42% <100.00%> (+1.08%) ⬆️
src/operators_dense.jl 92.49% <100.00%> (+0.45%) ⬆️

📣 We’re building smart automated test selection to slash your CI/CD build times. Learn more

@Krastanov
Copy link
Collaborator

This is a great improvement, thank you for taking initiative on it! Could you share a BenchmarkTools.@benchmark comparison before/after your change?

@Krastanov
Copy link
Collaborator

Here is a quick before/after.

Before:

julia> b = FockBasis(20)
       op1 = destroy(b)
       op2 = create(b)
       op3 = op1⊗op1
       lop = LazySum(LazyTensor(basis(op3),[1,2],[op1,op2]),op3)
       s = basisstate(b,5)
       s = s⊗s
       ss = copy(s)
       @benchmark QuantumOptics.mul!(s,lop,s)
BenchmarkTools.Trial: 10000 samples with 7 evaluations.
 Range (min … max):  4.354 μs …   7.528 μs  ┊ GC (min … max): 0.00% … 0.00%
 Time  (median):     4.405 μs               ┊ GC (median):    0.00%
 Time  (mean ± σ):   4.441 μs ± 117.392 ns  ┊ GC (mean ± σ):  0.00% ± 0.00%
 Memory estimate: 432 bytes, allocs estimate: 7.

After:

julia> b = FockBasis(20)
       op1 = destroy(b)
       op2 = create(b)
       op3 = op1⊗op1
       lop = LazySum(LazyTensor(basis(op3),[1,2],[op1,op2]),op3)
       s = basisstate(b,5)
       s = s⊗s
       ss = copy(s)
       @benchmark QuantumOptics.mul!(s,lop,s)
BenchmarkTools.Trial: 10000 samples with 7 evaluations.
 Range (min … max):  4.344 μs …  7.348 μs  ┊ GC (min … max): 0.00% … 0.00%
 Time  (median):     4.384 μs              ┊ GC (median):    0.00%
 Time  (mean ± σ):   4.400 μs ± 92.505 ns  ┊ GC (mean ± σ):  0.00% ± 0.00%
 Memory estimate: 192 bytes, allocs estimate: 4.

The change in performance is quite minimal in this microbenchmark (no runtime duration change, but half the allocations), but still better than status quo.

@AmitRotem
Copy link
Contributor Author

# basis
sub_system = FockBasis(3)
number_of_sub_systems = 5
full_system = sub_system^number_of_sub_systems
# state
ψ0 = randstate(full_system);
# hamiltonian
H_sum_tensor_sparse = let ops1, ops2, n=length(sub_system), r=0.5
    ops1 = (Operator(sub_system, sprandn(ComplexF64,n,n,r)) for k=1:number_of_sub_systems)
    ops1 = ops1.+dagger.(ops1)
    ops2 = (Operator(sub_system, sprandn(ComplexF64,n,n,r)) for k=1:number_of_sub_systems-1, j=1:2)
    ops2 = ops2.+dagger.(ops2)
    LazySum((LazyTensor(full_system, j, op) for (j,op)=enumerate(ops1))...)+
     LazySum((LazyTensor(full_system, (j,j+1), tuple(op...)) for (j,op)=enumerate(eachrow(ops2)))...)
end

Ht(H) = (_,_)->H # just be because schroedinger won't propagate operators

# propagate state
@time timeevolution.schroedinger_dynamic((0.0, 1.0), ψ0state, Ht(H_sum_tensor_sparse))
@time timeevolution.schroedinger_dynamic((0.0,10.0), ψ0state, Ht(H_sum_tensor_sparse))
# propagate operator
@time timeevolution.schroedinger_dynamic((0.0, 1.0), ψ0op, Ht(H_sum_tensor_sparse))
@time timeevolution.schroedinger_dynamic((0.0,10.0), ψ0op, Ht(H_sum_tensor_sparse))
;

Which for the current release gives gives;

0.239708 seconds (18.42 k allocations: 1.503 MiB)
1.644933 seconds (168.87 k allocations: 11.341 MiB)
0.356556 seconds (8.47 k allocations: 2.256 MiB)
3.014540 seconds (74.08 k allocations: 8.263 MiB)

And for this branch gives gives;

0.194773 seconds (11.43 k allocations: 842.953 KiB)
1.789770 seconds (108.41 k allocations: 5.263 MiB)
0.359107 seconds (123 allocations: 1.492 MiB)
3.327338 seconds (123 allocations: 1.492 MiB)

@amilsted
Copy link
Collaborator

Curious that the operator case allocations go down way more than the state case. Do you know why?

@AmitRotem
Copy link
Contributor Author

benchmark notebooks
release
this branch

@AmitRotem
Copy link
Contributor Author

The only difference between Ket and Operator is the reshape of Ket.data to a Matrix

@AmitRotem
Copy link
Contributor Author

Also, in the notebooks you can see that propagating operator with a dense hamiltonian has lots of allocations, whereas propagating a state does not. Is that a LinearAlgebra issue?

@amilsted
Copy link
Collaborator

The only difference between Ket and Operator is the reshape of Ket.data to a Matrix

Maybe it's a ReshapedArray() allocation...

Also, in the notebooks you can see that propagating operator with a dense hamiltonian has lots of allocations, whereas propagating a state does not. Is that a LinearAlgebra issue?

This is JuliaLang/julia#46865, which I have been trying to get fixed!

@AmitRotem
Copy link
Contributor Author

AmitRotem commented Feb 25, 2023

This reduces the allocation for Ket. Not sure if it will be this simple for Bra.

But this increases the allocation of LazySum of LazyTensor of dense by a bit.

Copy link
Collaborator

@amilsted amilsted left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the latest changes to avoid reshaping may be broken, but otherwise this looks good.
Update: Not broken. Now just suggesting we modify the Bra version too and relax to AbstractArray.

src/operators_lazytensor.jl Show resolved Hide resolved
src/operators_lazytensor.jl Show resolved Hide resolved
src/operators_lazytensor.jl Outdated Show resolved Hide resolved
@AmitRotem
Copy link
Contributor Author

Added dim check related to #74

Also, fix size of AbstractOperator

src/operators.jl Outdated Show resolved Hide resolved
@amilsted amilsted merged commit 2506d0d into qojulia:master Mar 27, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants