Add dispatch routines for `ldiv!` with transposed LU and transposed or Hermitian/Symmetric `B` #55760

aravindh-krishnamoorthy · 2024-09-12T15:03:52Z

Introduction

This PR is a possible fix for the issue #55422, where it is observed that, with LU factorisation, rdiv! is much slower than ldiv!, see the issue for details and analysis.

This PR adds four new dispatch routines (methods) for ldiv! with LU: for Adjoint, Transpose, Hermitian, and Symmetric.

Test Suite

using LinearAlgebra, BenchmarkTools
id(A,T) = T(Matrix{eltype(A)}(I, size(A)...))

A = randn(1000,1000);

@btime inv($A);
@btime ldiv!(lu(A), $(id(A,Matrix)));
@btime rdiv!($(id(A,Matrix)), lu(A));
@btime rdiv!($(id(A,Symmetric)), lu(A));

inv(A) ≈ ldiv!(lu(A), id(A,Matrix)) ≈ rdiv!(id(A,Matrix), lu(A)) ≈ rdiv!(id(A,Symmetric), lu(A))

A = complex.(randn(1000,1000),randn(1000,1000));

@btime inv($A);
@btime ldiv!(lu(A), $(id(A,Matrix)));
@btime rdiv!($(id(A,Matrix)), lu(A));
@btime rdiv!($(id(A,Hermitian)), lu(A));

inv(A) ≈ ldiv!(lu(A), id(A,Matrix)) ≈ rdiv!(id(A,Matrix), lu(A)) ≈ rdiv!(id(A,Hermitian), lu(A))

Test Results

See #55760 (comment)

aravindh-krishnamoorthy · 2024-09-12T16:26:31Z

Test Results

julia> using LinearAlgebra, BenchmarkTools
julia> id(A,T) = T(Matrix{eltype(A)}(I, size(A)...))
id (generic function with 1 method)

julia> A = randn(1000,1000);

julia> @btime inv($A);
  29.407 ms (10 allocations: 8.13 MiB)
julia> @btime ldiv!(lu(A), $(id(A,Matrix)));
  35.406 ms (7 allocations: 7.64 MiB)
julia> @btime rdiv!($(id(A,Matrix)), lu(A));
  32.574 ms (11 allocations: 15.27 MiB)
julia> @btime rdiv!($(id(A,Symmetric)), lu(A));
  32.137 ms (12 allocations: 15.27 MiB)

julia> inv(A) ≈ ldiv!(lu(A), id(A,Matrix)) ≈ rdiv!(id(A,Matrix), lu(A)) ≈ rdiv!(id(A,Symmetric), lu(A))
true

julia> A = complex.(randn(1000,1000),randn(1000,1000));

julia> @btime inv($A);
  66.822 ms (10 allocations: 16.24 MiB)
julia> @btime ldiv!(lu(A), $(id(A,Matrix)));
  96.932 ms (7 allocations: 15.27 MiB)
julia> @btime rdiv!($(id(A,Matrix)), lu(A));
  113.056 ms (11 allocations: 30.53 MiB)
julia> @btime rdiv!($(id(A,Hermitian)), lu(A));
  93.395 ms (12 allocations: 30.53 MiB)

julia> inv(A) ≈ ldiv!(lu(A), id(A,Matrix)) ≈ rdiv!(id(A,Matrix), lu(A)) ≈ rdiv!(id(A,Hermitian), lu(A))
true

dkarrasch · 2024-09-12T21:32:18Z

Wait, but these are not in-place, are they? The assumption is that ldiv! is acting on the second argument, without allocating new memory.

aravindh-krishnamoorthy · 2024-09-12T23:49:04Z

Wait, but these are not in-place, are they? The assumption is that ldiv! is acting on the second argument, without allocating new memory.

Good evening, @dkarrasch. You're right. Unfortunately, these are not in-place. The original implementation used generic matrix functions, which were about $50$ times slower. On the other hand, the LAPACK versions cannot apply transpose on the second matrix B and so require a copy. Hence, the choice is between a slower implementation and one that has to copy. I chose the latter.

If you have any suggestions for in-place expansion of B, I will be very happy to incorporate them. Unfortunately, I couldn't find any other way other than copy(B) and Matrix(B) to get rid of Transpose and Symmetric/Hermitian, respectively, so that B can be fed as the second matrix to LAPACK.

martinholters · 2024-09-13T06:05:40Z

Could maybe e.g. replace the copy(B) for Transpose with _transpose!(B.parent) where

function _transpose!(M::AbstractMatrix)
    Mt = reshape(M, reverse(size(M)))
    for i in axes(M, 1), j in firstindex(M, 2)+i:lastindex(M, 2)
        (Mt[j,i], M[i,j]) = (transpose(M[i,j]), transpose(Mt[j,i]))
    end
    return Mt
end

And similar for Hermitian and maybe Symmetric.

Have't thought this through completely, but maybe its an idea to work with.

EDIT: The indexing is horribly wrong; this doesn't do a proper transpose.

dkarrasch · 2024-09-13T08:26:17Z

The "promise" of ldiv! is not to be fast, but to operate in-place. The usage pattern is like this:

A = factorize(rand(3,3)) # for instance
B = rand(3, 5)
ldiv!(A, B)
# continue to work with B

If this wasn't in-place, you would need C = ldiv!(A, B), to have a handle for the result, which users don't expect. So they would miss the result, and believe the result is stored in B. It does happen already that in-place methods are slower than allocating ones: e.g. that is the case for mul! vs. * with different BLAS-eltypes.

martinholters · 2024-09-13T08:42:40Z

I think my proposal should in principle achieve this, as the reshaped array shares memory with the original one. I'll try to fix the indexing to actually do a transposition and then give a more complete implementation of what I have in mind.

aravindh-krishnamoorthy · 2024-09-13T09:00:48Z

The "promise" of ldiv! is not to be fast, but to operate in-place. The usage pattern is like this:
A = factorize(rand(3,3)) # for instance B = rand(3, 5) ldiv!(A, B)
# continue to work with B
If this wasn't in-place, you would need C = ldiv!(A, B), to have a handle for the result, which users don't expect. So they would miss the result, and believe the result is stored in B. It does happen already that in-place methods are slower than allocating ones: e.g. that is the case for mul! vs. * with different BLAS-eltypes.

Thank you, @dkarrasch. I fully agree that in the final version, B must be in-place. To this end, I will evaluate various options of in-place transposition and filling, including the proposal by @martinholters, and other options in generic_matmul.

I feel that it may be best to use LAPACK for rdiv! and ldiv! when the type is BlasComplex or BlasFloat. Would you agree with this assessment?

dkarrasch · 2024-09-13T09:11:10Z

Would you agree with this assessment?

It depends. When there are different BLAS-types invovled, there are no such BLAS-methods. So we will need to fall back to generic methods. The point is that we have two orthogonal goals: performance vs memory allocation. When users don't mind the memory allocation, then they can use \, and that may in fact promote eltypes or materialize a wrapper, to finally reach a faster method.

martinholters · 2024-09-13T09:45:35Z

So my _transpose! from #55760 (comment) is wrong for rectangular matrices. Indeed, doing an inplace transposition (such that the reshape that gives the correct transpose) is trickier then I though at first as it is more than just pairwise swapping. Not sure there exists a nice solution, but assume there does for the moment, then what I have in mind is something along these lines

function ldiv!(A::TransposeFactorization{T,<:LU{T,<:StridedMatrix}}, B::Transpose{T,<:StridedVecOrMat{T}}) where {T<:Union{BlasFloat,BlasComplex}}
    Bt = reshape(B.parent, size(B))
    copyto!(Bt, copy(B)) # see below
    LAPACK.getrs!('T', A.parent.factors, A.parent.ipiv, Bt)
    copyto!(Bt, copy(B)) # see below
    return B
end

If there is a way to express the copyto!(Bt, copy(B)) (noting the aliasing of Bt and B) as an inplace operation permuting the elements, this could be an attractive alternative.

aravindh-krishnamoorthy · 2024-09-13T10:14:10Z

Would you agree with this assessment?

It depends. When there are different BLAS-types invovled, there are no such BLAS-methods. So we will need to fall back to generic methods. The point is that we have two orthogonal goals: performance vs memory allocation. When users don't mind the memory allocation, then they can use \, and that may in fact promote eltypes or materialize a wrapper, to finally reach a faster method.

Thank you, @dkarrasch. I agree with your response. If you feel it is not worth taking forward, perhaps we can make it clear in the rdiv! documentation that it might be up to $50\mathrm{x}$ slower than ldiv! even for BLAS types. Alternatively, we can move the copy to rdiv! and explain in the documentation about the allocation when A is not a transpose-type and eltypes are BLAS types.

I understand the surprise of OP and the poster on Discourse when rdiv! is $50\mathrm{x}$ slower than ldiv!. I feel that this will be a repeat issue that will be raised several times as users discover it. Hence, it may be better to fix this issue permanently.

dkarrasch · 2024-09-13T12:09:47Z

It turns out this was a silly dispatch issue, so this PR should be superseded by #55764.

aravindh-krishnamoorthy force-pushed the lu-55422 branch 5 times, most recently from dc5188a to 10445d1 Compare September 12, 2024 16:14

Add dispatch routines for ldiv! with transposed LU

47c7506

aravindh-krishnamoorthy force-pushed the lu-55422 branch from 10445d1 to 47c7506 Compare September 12, 2024 18:17

Tests for the dispatch routines for ldiv! with transposed LU

1537871

dkarrasch added the linear algebra Linear algebra label Sep 12, 2024

aravindh-krishnamoorthy closed this Sep 13, 2024

aravindh-krishnamoorthy deleted the lu-55422 branch September 13, 2024 14:08

aravindh-krishnamoorthy mentioned this pull request Sep 28, 2024

rdiv! on LU object is much slower than ldiv! #55422

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add dispatch routines for `ldiv!` with transposed LU and transposed or Hermitian/Symmetric `B` #55760

Add dispatch routines for `ldiv!` with transposed LU and transposed or Hermitian/Symmetric `B` #55760

aravindh-krishnamoorthy commented Sep 12, 2024 •

edited

Loading

aravindh-krishnamoorthy commented Sep 12, 2024 •

edited

Loading

dkarrasch commented Sep 12, 2024

aravindh-krishnamoorthy commented Sep 12, 2024

martinholters commented Sep 13, 2024 •

edited

Loading

dkarrasch commented Sep 13, 2024 •

edited

Loading

martinholters commented Sep 13, 2024

aravindh-krishnamoorthy commented Sep 13, 2024

dkarrasch commented Sep 13, 2024

martinholters commented Sep 13, 2024

aravindh-krishnamoorthy commented Sep 13, 2024

dkarrasch commented Sep 13, 2024

Add dispatch routines for ldiv! with transposed LU and transposed or Hermitian/Symmetric B #55760

Add dispatch routines for ldiv! with transposed LU and transposed or Hermitian/Symmetric B #55760

Conversation

aravindh-krishnamoorthy commented Sep 12, 2024 • edited Loading

Introduction

Test Suite

Test Results

aravindh-krishnamoorthy commented Sep 12, 2024 • edited Loading

Test Results

dkarrasch commented Sep 12, 2024

aravindh-krishnamoorthy commented Sep 12, 2024

martinholters commented Sep 13, 2024 • edited Loading

dkarrasch commented Sep 13, 2024 • edited Loading

martinholters commented Sep 13, 2024

aravindh-krishnamoorthy commented Sep 13, 2024

dkarrasch commented Sep 13, 2024

martinholters commented Sep 13, 2024

aravindh-krishnamoorthy commented Sep 13, 2024

dkarrasch commented Sep 13, 2024

Add dispatch routines for `ldiv!` with transposed LU and transposed or Hermitian/Symmetric `B` #55760

Add dispatch routines for `ldiv!` with transposed LU and transposed or Hermitian/Symmetric `B` #55760

aravindh-krishnamoorthy commented Sep 12, 2024 •

edited

Loading

aravindh-krishnamoorthy commented Sep 12, 2024 •

edited

Loading

martinholters commented Sep 13, 2024 •

edited

Loading

dkarrasch commented Sep 13, 2024 •

edited

Loading