parallel LU factorization memory leak #15450

thraen · 2016-03-11T02:57:57Z

I encountered a memory leak when I tried to solve LU-factorizations in parallel:

const m     = 100 
const n     = 100

function doit()
    x   = speye(m*n)
    X   = lufact(x)

    p   = rand(m,n)

    for i=1: 10000
        @sync @parallel for t= 1:100
            ax  = X\p[:]
        end     
        println(i)
        #@everywhere gc()
    end
end
doit()

The text was updated successfully, but these errors were encountered:

jiahao · 2016-03-11T03:48:27Z

What do you mean by "memory leak"? Did you get an error? Did the code not do what you expected?

andreasnoack · 2016-03-11T04:14:10Z

I can confirm this. Sparse factorizations cannot usually be moved across workers, since the pointer is set to zero during the serializing process. However, in contrast to the Cholesky, the sparse LU recomputes the factorization instead of failing when it is detected that the pointer is null. I'm still not sure why this causes a leak.

tkelman · 2016-03-11T05:44:17Z

in contrast to the Cholesky, the sparse LU recomputes the factorization instead of failing when it is detected that the pointer is null

Is that difference intentional? Seems like we should stick to one or the other.

thraen · 2016-03-11T12:28:02Z

Yes, and I think there should be a warning in the case of recomputing. I forgot to mention that the leak also happens when I define the LU factorization everywhere like this:

@everywhere const m = 100 
@everywhere const n = 100
@everywhere const x = speye(m*n)
@everywhere const X = lufact(x)
function doit()
    p   = rand(m,n)
    for i=1: 10000
        @sync @parallel for t= 1:100
            ax  = X\p[:]
        end
        println(i)
    end
end
doit()

I wasn't aware that even then the LU factorization is moved across workers (and that it's recomputed on the workers).
In case somebody has the same problem: to prevent this one has to wrap the solver into a closure:

@everywhere const m = 100 
@everywhere const n = 100
@everywhere const x = speye(m*n)
@everywhere const X = lufact(x)
@everywhere function solveLU(b)
    return X\b
end

function doit()
    p   = rand(m,n)
    for i=1: 10000
        @sync @parallel for t= 1:100
            solveLU(p[:])
        end
        println(i)
    end
end
doit()

andreasnoack · 2016-03-11T15:03:36Z

Is that difference intentional?

I don't think so. UMFPACK and CHOLMOD have a slightly different design, but our wrappers are also written by different people. The present memory management model in the CHOLMOD wrappers is one I introduced out of necessity, but I've almost not contributed to the UMFPACK wrappers.

It might be possible to recompute a CHOLMOD.Factor after serialization, but that would require that we store the original sparse matrix and some meta information with the factorization. That is what the UmfpackLU does, but I also think that UMFPACK has less global state. CHOLMOD has the common struct that controls if the factorization is a Cholesky or LDLt. We would need to carry such information explicitly in the type as well for getting recomputation after serialization working.

The usual tradeoffs between generality and hidden slowness also apply here. It might be possible to avoid the error when moving sparse factorizations, but it will be extremely inefficient to move and recompute the factorization on the new workers. Users probably want to use @parallel to get a speedup, but they probably won't in this case and it will be tricky to realize why.

...but back to this issue. I think I have an idea about what is causing the memory leak and will get back with an update on that.

tkelman · 2016-03-12T03:26:40Z

Maybe erroring instead of recomputing in the LU case would be a better option (and more consistent with Cholesky) than quietly re-factorizing.

ViralBShah · 2018-12-17T06:26:14Z

@andreasnoack Was this resolved?

andreasnoack · 2018-12-17T12:38:47Z

No this is still an issue. The problem is that we don't register a finalizer when deserializing the factorization so the memory allocated when the factorization is recomputed is never released. The issue can therefore be reproduced just with

julia> using SparseArrays, SuiteSparse, LinearAlgebra, Serialization

julia> b = IOBuffer();

julia> F = lu(sparse(1.0I, 10000, 10000));

julia> foreach(1:1000) do i
         seekstart(b)
         serialize(b, F)
         seekstart(b)
         SuiteSparse.UMFPACK.umfpack_symbolic!(deserialize(b))
       end;

I think there are three possible solution

Throw on serialization to avoid leaks. It also has the benefit of revealing the costly recomputation of the factorization which is kind of hidden in the original issue. This solution would be breaking, though, so probably not the best solution.
Define a custom a deserialization method that registers a finalizer. This should get rid of the leak but will still hide the costly recomputation.
Define a proper serialization for these objects. This would probably require that we match the C structs with Julia structs and on deserialization copy the content to memory not managed by the GC. This might be slightly tricky.

I suspect we should start with 2 and then open a separate to track the development of 3.

Fixes #15450

) Fixes #15450

) Fixes #15450 (cherry picked from commit 356ceee)

kshyatt added linear algebra Linear algebra sparse Sparse arrays parallelism Parallel or distributed computation labels Jul 28, 2016

andreasnoack added a commit that referenced this issue Dec 17, 2018

Add custom deserialize method for UmfpackLU to avoid memory leak

c3b5b2e

Fixes #15450

andreasnoack mentioned this issue Dec 17, 2018

Add custom deserialize method for UmfpackLU to avoid memory leak #30425

Merged

andreasnoack added a commit that referenced this issue Dec 19, 2018

Add custom deserialize method for UmfpackLU to avoid memory leak

185d33a

Fixes #15450

andreasnoack added a commit that referenced this issue Jan 2, 2019

Add custom deserialize method for UmfpackLU to avoid memory leak

210063f

Fixes #15450

andreasnoack closed this as completed in #30425 Jan 2, 2019

andreasnoack added a commit that referenced this issue Jan 2, 2019

Add custom deserialize method for UmfpackLU to avoid memory leak (#30425

356ceee

) Fixes #15450

KristofferC pushed a commit that referenced this issue Jan 11, 2019

Add custom deserialize method for UmfpackLU to avoid memory leak (#30425

7542c72

) Fixes #15450 (cherry picked from commit 356ceee)

KristofferC pushed a commit that referenced this issue Feb 4, 2019

Add custom deserialize method for UmfpackLU to avoid memory leak (#30425

f9ed3c9

) Fixes #15450 (cherry picked from commit 356ceee)

KristofferC pushed a commit that referenced this issue Feb 11, 2019

Add custom deserialize method for UmfpackLU to avoid memory leak (#30425

78c8c71

) Fixes #15450 (cherry picked from commit 356ceee)

KristofferC pushed a commit that referenced this issue Apr 20, 2019

Add custom deserialize method for UmfpackLU to avoid memory leak (#30425

bca2f42

) Fixes #15450 (cherry picked from commit 356ceee)

KristofferC pushed a commit that referenced this issue Feb 20, 2020

Add custom deserialize method for UmfpackLU to avoid memory leak (#30425

f09782e

) Fixes #15450 (cherry picked from commit 356ceee)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

parallel LU factorization memory leak #15450

parallel LU factorization memory leak #15450

thraen commented Mar 11, 2016

jiahao commented Mar 11, 2016

andreasnoack commented Mar 11, 2016

tkelman commented Mar 11, 2016

thraen commented Mar 11, 2016

andreasnoack commented Mar 11, 2016

tkelman commented Mar 12, 2016

ViralBShah commented Dec 17, 2018

andreasnoack commented Dec 17, 2018 •

edited

Loading

parallel LU factorization memory leak #15450

parallel LU factorization memory leak #15450

Comments

thraen commented Mar 11, 2016

jiahao commented Mar 11, 2016

andreasnoack commented Mar 11, 2016

tkelman commented Mar 11, 2016

thraen commented Mar 11, 2016

andreasnoack commented Mar 11, 2016

tkelman commented Mar 12, 2016

ViralBShah commented Dec 17, 2018

andreasnoack commented Dec 17, 2018 • edited Loading

andreasnoack commented Dec 17, 2018 •

edited

Loading