Reduce allocations for multipling LazyTensor of sparse and dense #80

AmitRotem · 2023-02-23T23:43:24Z

Following suggestion from @amilsted regarding qojulia/QuantumOptics.jl#352

shape, strides_j and strides_k in _gemm_puresparse function are implemented as Tuple to reduce allocations.

codecov · 2023-02-23T23:54:37Z

Codecov Report

Merging #80 (f91c5a0) into master (3ed0e84) will increase coverage by 0.10%.
The diff coverage is 97.29%.

@@            Coverage Diff             @@
##           master      #80      +/-   ##
==========================================
+ Coverage   92.61%   92.71%   +0.10%     
==========================================
  Files          24       24              
  Lines        3089     3104      +15     
==========================================
+ Hits         2861     2878      +17     
+ Misses        228      226       -2

Impacted Files	Coverage Δ
src/operators_lazytensor.jl	`95.88% <96.66%> (-0.15%)`	⬇️
src/operators.jl	`97.42% <100.00%> (+1.08%)`	⬆️
src/operators_dense.jl	`92.49% <100.00%> (+0.45%)`	⬆️

📣 We’re building smart automated test selection to slash your CI/CD build times. Learn more

Krastanov · 2023-02-24T00:59:05Z

This is a great improvement, thank you for taking initiative on it! Could you share a BenchmarkTools.@benchmark comparison before/after your change?

Krastanov · 2023-02-24T01:43:05Z

Here is a quick before/after.

Before:

julia> b = FockBasis(20)
       op1 = destroy(b)
       op2 = create(b)
       op3 = op1⊗op1
       lop = LazySum(LazyTensor(basis(op3),[1,2],[op1,op2]),op3)
       s = basisstate(b,5)
       s = s⊗s
       ss = copy(s)
       @benchmark QuantumOptics.mul!(s,lop,s)
BenchmarkTools.Trial: 10000 samples with 7 evaluations.
 Range (min … max):  4.354 μs …   7.528 μs  ┊ GC (min … max): 0.00% … 0.00%
 Time  (median):     4.405 μs               ┊ GC (median):    0.00%
 Time  (mean ± σ):   4.441 μs ± 117.392 ns  ┊ GC (mean ± σ):  0.00% ± 0.00%
 Memory estimate: 432 bytes, allocs estimate: 7.

After:

julia> b = FockBasis(20)
       op1 = destroy(b)
       op2 = create(b)
       op3 = op1⊗op1
       lop = LazySum(LazyTensor(basis(op3),[1,2],[op1,op2]),op3)
       s = basisstate(b,5)
       s = s⊗s
       ss = copy(s)
       @benchmark QuantumOptics.mul!(s,lop,s)
BenchmarkTools.Trial: 10000 samples with 7 evaluations.
 Range (min … max):  4.344 μs …  7.348 μs  ┊ GC (min … max): 0.00% … 0.00%
 Time  (median):     4.384 μs              ┊ GC (median):    0.00%
 Time  (mean ± σ):   4.400 μs ± 92.505 ns  ┊ GC (mean ± σ):  0.00% ± 0.00%
 Memory estimate: 192 bytes, allocs estimate: 4.

The change in performance is quite minimal in this microbenchmark (no runtime duration change, but half the allocations), but still better than status quo.

AmitRotem · 2023-02-24T19:39:40Z

# basis
sub_system = FockBasis(3)
number_of_sub_systems = 5
full_system = sub_system^number_of_sub_systems
# state
ψ0 = randstate(full_system);
# hamiltonian
H_sum_tensor_sparse = let ops1, ops2, n=length(sub_system), r=0.5
    ops1 = (Operator(sub_system, sprandn(ComplexF64,n,n,r)) for k=1:number_of_sub_systems)
    ops1 = ops1.+dagger.(ops1)
    ops2 = (Operator(sub_system, sprandn(ComplexF64,n,n,r)) for k=1:number_of_sub_systems-1, j=1:2)
    ops2 = ops2.+dagger.(ops2)
    LazySum((LazyTensor(full_system, j, op) for (j,op)=enumerate(ops1))...)+
     LazySum((LazyTensor(full_system, (j,j+1), tuple(op...)) for (j,op)=enumerate(eachrow(ops2)))...)
end

Ht(H) = (_,_)->H # just be because schroedinger won't propagate operators

# propagate state
@time timeevolution.schroedinger_dynamic((0.0, 1.0), ψ0state, Ht(H_sum_tensor_sparse))
@time timeevolution.schroedinger_dynamic((0.0,10.0), ψ0state, Ht(H_sum_tensor_sparse))
# propagate operator
@time timeevolution.schroedinger_dynamic((0.0, 1.0), ψ0op, Ht(H_sum_tensor_sparse))
@time timeevolution.schroedinger_dynamic((0.0,10.0), ψ0op, Ht(H_sum_tensor_sparse))
;

Which for the current release gives gives;

0.239708 seconds (18.42 k allocations: 1.503 MiB)
1.644933 seconds (168.87 k allocations: 11.341 MiB)
0.356556 seconds (8.47 k allocations: 2.256 MiB)
3.014540 seconds (74.08 k allocations: 8.263 MiB)

And for this branch gives gives;

0.194773 seconds (11.43 k allocations: 842.953 KiB)
1.789770 seconds (108.41 k allocations: 5.263 MiB)
0.359107 seconds (123 allocations: 1.492 MiB)
3.327338 seconds (123 allocations: 1.492 MiB)

src/operators_lazytensor.jl

amilsted · 2023-02-24T19:50:55Z

Curious that the operator case allocations go down way more than the state case. Do you know why?

AmitRotem · 2023-02-24T19:59:25Z

benchmark notebooks
release
this branch

AmitRotem · 2023-02-24T20:00:40Z

The only difference between Ket and Operator is the reshape of Ket.data to a Matrix

AmitRotem · 2023-02-24T20:03:13Z

Also, in the notebooks you can see that propagating operator with a dense hamiltonian has lots of allocations, whereas propagating a state does not. Is that a LinearAlgebra issue?

amilsted · 2023-02-24T22:29:22Z

The only difference between Ket and Operator is the reshape of Ket.data to a Matrix

Maybe it's a ReshapedArray() allocation...

Also, in the notebooks you can see that propagating operator with a dense hamiltonian has lots of allocations, whereas propagating a state does not. Is that a LinearAlgebra issue?

This is JuliaLang/julia#46865, which I have been trying to get fixed!

AmitRotem · 2023-02-25T01:11:35Z

This reduces the allocation for Ket. Not sure if it will be this simple for Bra.

But this increases the allocation of LazySum of LazyTensor of dense by a bit.

amilsted

I think the latest changes to avoid reshaping may be broken, but otherwise this looks good.
Update: Not broken. Now just suggesting we modify the Bra version too and relax to AbstractArray.

src/operators_lazytensor.jl

…mm_recursive_dense_lazy

AmitRotem · 2023-03-22T23:26:16Z

Added dim check related to #74

Also, fix size of AbstractOperator

src/operators_lazytensor.jl

src/operators.jl

AmitRotem added 4 commits February 23, 2023 14:59

make shape and strides a tuple

671fae8

make _strides type stable for tuple

2a19fd9

reduce calls to _comp_size

0a5e5b5

remove type constraint from _get_shape_and_srtides

e051a0a

AmitRotem force-pushed the master branch from 714dd34 to e051a0a Compare February 24, 2023 19:47

amilsted reviewed Feb 24, 2023

View reviewed changes

src/operators_lazytensor.jl Outdated Show resolved Hide resolved

fix typo

52c0551

AmitRotem added 2 commits February 24, 2023 15:50

allow _gemm_recursive_lazy_dense matrix vector mul

6070884

type fix

5b473d6

amilsted requested changes Mar 16, 2023

View reviewed changes

src/operators_lazytensor.jl Show resolved Hide resolved

src/operators_lazytensor.jl Show resolved Hide resolved

src/operators_lazytensor.jl Outdated Show resolved Hide resolved

AmitRotem and others added 8 commits March 17, 2023 10:59

Merge branch 'qojulia:master' into master

9911d6e

relax type in _gemm_puresparse\!, _gemm_recursive_lazy_dense, and _ge…

cca18de

…mm_recursive_dense_lazy

Merge branch 'qojulia:master' into master

4d7d294

fix size(AbstractOperator)

b7bdb94

add dim check for _gemm_puresparse

7e64f54

test LazyTensor with sparse dimension mismatch

a616377

change error thrown by size to ErrorException

316ba34

test size of AbstractOperator

6984ff0

amilsted reviewed Mar 23, 2023

View reviewed changes

src/operators_lazytensor.jl Outdated Show resolved Hide resolved

amilsted reviewed Mar 23, 2023

View reviewed changes

src/operators.jl Outdated Show resolved Hide resolved

Amit Rotem added 2 commits March 24, 2023 16:51

use lazy string

8aa2ab9

split check_mul\!_compatibility and specialize for vectors

f91c5a0

amilsted approved these changes Mar 27, 2023

View reviewed changes

amilsted merged commit 2506d0d into qojulia:master Mar 27, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Reduce allocations for multipling LazyTensor of sparse and dense #80

Reduce allocations for multipling LazyTensor of sparse and dense #80

AmitRotem commented Feb 23, 2023

codecov bot commented Feb 23, 2023 •

edited

Loading

Krastanov commented Feb 24, 2023

Krastanov commented Feb 24, 2023

AmitRotem commented Feb 24, 2023

amilsted commented Feb 24, 2023

AmitRotem commented Feb 24, 2023

AmitRotem commented Feb 24, 2023

AmitRotem commented Feb 24, 2023

amilsted commented Feb 24, 2023

AmitRotem commented Feb 25, 2023 •

edited

Loading

amilsted left a comment •

edited

Loading

AmitRotem commented Mar 22, 2023

Reduce allocations for multipling LazyTensor of sparse and dense #80

Reduce allocations for multipling LazyTensor of sparse and dense #80

Conversation

AmitRotem commented Feb 23, 2023

codecov bot commented Feb 23, 2023 • edited Loading

Codecov Report

Krastanov commented Feb 24, 2023

Krastanov commented Feb 24, 2023

AmitRotem commented Feb 24, 2023

amilsted commented Feb 24, 2023

AmitRotem commented Feb 24, 2023

AmitRotem commented Feb 24, 2023

AmitRotem commented Feb 24, 2023

amilsted commented Feb 24, 2023

AmitRotem commented Feb 25, 2023 • edited Loading

amilsted left a comment • edited Loading

Choose a reason for hiding this comment

AmitRotem commented Mar 22, 2023

codecov bot commented Feb 23, 2023 •

edited

Loading

AmitRotem commented Feb 25, 2023 •

edited

Loading

amilsted left a comment •

edited

Loading