Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Crash on 2.8 GB data: "EXCEPTION: Pool allocation failed" #9

Open
KirillKryukov opened this issue Nov 14, 2019 · 5 comments
Open

Crash on 2.8 GB data: "EXCEPTION: Pool allocation failed" #9

KirillKryukov opened this issue Nov 14, 2019 · 5 comments

Comments

@KirillKryukov
Copy link

KirillKryukov commented Nov 14, 2019

Leon compression crashes on some data. Example data:

leon-repro-1.fa.gz (784 MB archive, inside is a 2.8 GB file).

Command to reproduce (after decompressing the gzipped data):

leon -seq-only -file leon-repro-1.fa -c -kmer-size 3

This command crashes with the following colsole output:

        Input format: Fasta
[DSK: Pass 1/1, Step 2: counting kmers   ]  70.5 %   elapsed:   0 min 28 sec   remaining:   0 min 12 sec   cpu: 472.6 %   mem: [  66,   66,   66] MB EXCEPTION: Pool allocation failed for 3012690144 bytes (kmers alloc). Current usage is 16 and capacity is 2097152000

Also after crash Leon leaves 85 temporary files in current directory, totaling 21 GB.

I noticed that Leon paper mentions using Leon on a 733 GB data. Therefore I assumed that comparatively small data size of 2.8 GB should be no problem.

@rchikhi
Copy link
Member

rchikhi commented Nov 14, 2019

Hi Kirill, you could try increasing the default -max-memory value and see if it still crashes. Having a k-mer size of 3 seems also problematic. Was that a typo?
Rayan

@KirillKryukov
Copy link
Author

@rchikhi , how to use the -max-memory? Is it a command line option of leon? Is it set in bytes, kilobytes, megabytes or gigabytes? It's not mentioned in leon console output:

[leon options]
       -file         (1 arg) :    input file (e.g. FASTA/FASTQ for compress or .leon file for decompress)
       -c            (0 arg) :    compression
       -d            (0 arg) :    decompression
       -nb-cores     (1 arg) :    number of cores (default is the available number of cores)  [default '0']
       -verbose      (1 arg) :    verbosity level  [default '1']
       -lossless     (0 arg) :    switch to lossless compression for qualities (default is lossy. lossy has much higher compression rate, and the loss is in fact a gain. lossy is better!)

   [compression options]
          -kmer-size               (1 arg) :    size of a kmer  [default '31']
          -abundance               (1 arg) :    abundance threshold for solid kmers (default inferred)  [default '']
          -seq-only                (0 arg) :    store dna seq only, header and quals are discarded, will decompress to fasta (same as -noheader -noqual)
          -noheader                (0 arg) :    discard header
          -noqual                  (0 arg) :    discard quality scores

3 is not a typo. Is there a known problem with this setting?

@rchikhi
Copy link
Member

rchikhi commented Nov 18, 2019

I'm sorry, I thought leon exposed this parameter, it does not. If it still accepts it, it's in megabytes. So try e.g. -max-memory 10000.
Kmer size of 3 is very problematic. What's your rationale for it? Leon should perform well with specific kmers, e.g. likely above 12 or 15, preferably in the range [20;50].

@KirillKryukov
Copy link
Author

Thank you @rchikhi , I will try this parameter and let you know if it helped.

As for the rationale, I am doing parameter sweep to find the optimal kmer size for various kinds of data. Since Leon accepts kmer sizes starting from 2, and since until now I haven't seen any recommendation for avoiding small kmer sizes (may be I missed it?), I am testing the entire range. Now, thanks to your very helpful advice, I can probably ignore kmer sizes smaller than 12, if I understand you correctly?

@rchikhi
Copy link
Member

rchikhi commented Nov 20, 2019

yes

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants