Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Memory error with many reference genomes as input #58

Open
dpellow opened this issue May 16, 2020 · 16 comments
Open

Memory error with many reference genomes as input #58

dpellow opened this issue May 16, 2020 · 16 comments

Comments

@dpellow
Copy link

dpellow commented May 16, 2020

I'm trying to run BCALM on a very large database and I keep getting errors that look like:

mem: [ 100, 91622, 92822] MB EXCEPTION: Pool reserve() failed for 576716800000 (+ 1600 extra for 36 cores) bytes

Is there some way to get around this? More/fewer cores? Lower max-memory (I don't have much more RAM available)? Large value for -max-disk parameter?

Thanks!

@rchikhi
Copy link
Member

rchikhi commented May 16, 2020

how many distinct kmers reported at the beginning of the output?

@dpellow
Copy link
Author

dpellow commented May 16, 2020

I don't see k-mers reported in the std output.
In the .hf5 file there's a field that says estimated k-mer number: 69880669047 - just under 70 billion. (k=31, would trying a lower number help?)

@rchikhi
Copy link
Member

rchikhi commented May 16, 2020

can you please paste the full bcalm output?
also what's the genome size / fastq file size?
Bcalm on 70 billion distinct kmers is no small feat (what organism is that?) and 92 GB ram would be understandably low..

@dpellow
Copy link
Author

dpellow commented May 16, 2020

I've pasted it below.
It's for a metagenomic simulation - the concatenated reference genomes are ~70Gbp, so this matches the k-mer count assuming almost all are unique.
From what I understood the error occurred with ~550GB RAM which is what I passed to the -max-memory flag

[Approximating frequencies of minimizers ]  100  %   elapsed:   5 min 1  sec   remaining:   0 min 0  sec   cpu: 100.0 %   mem: [ 321,  321,  516] MB 
[DSK: Collecting stats on all_download...]  100  %   elapsed:   0 min 2  sec   remaining:   0 min 0  sec   cpu: 100.0 %   mem: [  92,   92,  516] MB 
[DSK: Pass 1/1, Step 1: partitioning     ]  20   %   elapsed:  10 min 38 sec   remaining:  42 min 30 sec   cpu: 1589.0 %   mem: [100293, 100293, 100293] 
[DSK: Pass 1/1, Step 1: partitioning     ]  21   %   elapsed:  11 min 4  sec   remaining:  41 min 37 sec   cpu: 1601.2 %   mem: [101318, 101318, 101318] 
[DSK: Pass 1/1, Step 1: partitioning     ]  22   %   elapsed:  11 min 30 sec   remaining:  40 min 46 sec   cpu: 1612.5 %   mem: [102247, 102247, 102247] 
[DSK: Pass 1/1, Step 1: partitioning     ]  23   %   elapsed:  11 min 56 sec   remaining:  39 min 57 sec   cpu: 1623.0 %   mem: [103117, 103117, 103117] 
[DSK: Pass 1/1, Step 1: partitioning     ]  24   %   elapsed:  12 min 22 sec   remaining:  39 min 9  sec   cpu: 1632.6 %   mem: [103934, 103934, 103934] 
[DSK: Pass 1/1, Step 1: partitioning     ]  25   %   elapsed:  12 min 48 sec   remaining:  38 min 24 sec   cpu: 1641.8 %   mem: [104709, 104709, 104709] 
[DSK: Pass 1/1, Step 1: partitioning     ]  26   %   elapsed:  13 min 14 sec   remaining:  37 min 41 sec   cpu: 1650.3 %   mem: [105418, 105418, 105418] 
[DSK: Pass 1/1, Step 1: partitioning     ]  27   %   elapsed:  13 min 41 sec   remaining:  36 min 58 sec   cpu: 1658.2 %   mem: [106074, 106074, 106074] 
[DSK: Pass 1/1, Step 1: partitioning     ]  28   %   elapsed:  14 min 7  sec   remaining:  36 min 17 sec   cpu: 1665.7 %   mem: [106658, 106658, 106658] 
[DSK: Pass 1/1, Step 1: partitioning     ]  29   %   elapsed:  14 min 33 sec   remaining:  35 min 37 sec   cpu: 1672.7 %   mem: [107197, 107197, 107197] 
[DSK: Pass 1/1, Step 1: partitioning     ]  30   %   elapsed:  14 min 59 sec   remaining:  34 min 58 sec   cpu: 1679.3 %   mem: [107646, 107646, 107646] 
[DSK: Pass 1/1, Step 1: partitioning     ]  31   %   elapsed:  15 min 25 sec   remaining:  34 min 19 sec   cpu: 1685.5 %   mem: [108080, 108080, 108080] 
[DSK: Pass 1/1, Step 1: partitioning     ]  32   %   elapsed:  15 min 51 sec   remaining:  33 min 41 sec   cpu: 1691.4 %   mem: [108456, 108456, 108456] 
[DSK: Pass 1/1, Step 1: partitioning     ]  33   %   elapsed:  16 min 18 sec   remaining:  33 min 5  sec   cpu: 1697.0 %   mem: [108779, 108779, 108779] 
[DSK: Pass 1/1, Step 1: partitioning     ]  34   %   elapsed:  16 min 43 sec   remaining:  32 min 28 sec   cpu: 1702.2 %   mem: [109079, 109079, 109079] 
[DSK: Pass 1/1, Step 1: partitioning     ]  35   %   elapsed:  17 min 10 sec   remaining:  31 min 52 sec   cpu: 1707.2 %   mem: [109336, 109336, 109336] 
[DSK: Pass 1/1, Step 1: partitioning     ]  36   %   elapsed:  17 min 36 sec   remaining:  31 min 17 sec   cpu: 1712.0 %   mem: [109566, 109566, 109566] 
[DSK: Pass 1/1, Step 1: partitioning     ]  37   %   elapsed:  18 min 2  sec   remaining:  30 min 42 sec   cpu: 1716.5 %   mem: [109763, 109763, 109763] 
[DSK: Pass 1/1, Step 1: partitioning     ]  38   %   elapsed:  18 min 28 sec   remaining:  30 min 8  sec   cpu: 1720.8 %   mem: [109920, 109920, 109920] 
[DSK: Pass 1/1, Step 1: partitioning     ]  39   %   elapsed:  18 min 54 sec   remaining:  29 min 34 sec   cpu: 1724.9 %   mem: [110064, 110064, 110064] 
[DSK: Pass 1/1, Step 1: partitioning     ]  40   %   elapsed:  19 min 20 sec   remaining:  29 min 0  sec   cpu: 1728.8 %   mem: [110191, 110191, 110191] 
[DSK: Pass 1/1, Step 1: partitioning     ]  41   %   elapsed:  19 min 48 sec   remaining:  28 min 30 sec   cpu: 1729.1 %   mem: [102528, 110191, 110194] 
[DSK: Pass 1/1, Step 1: partitioning     ]  42   %   elapsed:  20 min 19 sec   remaining:  28 min 3  sec   cpu: 1726.8 %   mem: [97109, 110191, 110194] $
[DSK: Pass 1/1, Step 1: partitioning     ]  43   %   elapsed:  20 min 50 sec   remaining:  27 min 37 sec   cpu: 1723.6 %   mem: [97199, 110191, 110194] $
[DSK: Pass 1/1, Step 1: partitioning     ]  44   %   elapsed:  21 min 21 sec   remaining:  27 min 10 sec   cpu: 1720.6 %   mem: [97268, 110191, 110194] M
[DSK: Pass 1/1, Step 1: partitioning     ]  45   %   elapsed:  21 min 54 sec   remaining:  26 min 46 sec   cpu: 1715.4 %   mem: [91628, 110191, 110194] M
[DSK: Pass 1/1, Step 1: partitioning     ]  46   %   elapsed:  22 min 30 sec   remaining:  26 min 25 sec   cpu: 1706.1 %   mem: [80606, 110191, 110194] M
[DSK: Pass 1/1, Step 1: partitioning     ]  47   %   elapsed:  23 min 8  sec   remaining:  26 min 5  sec   cpu: 1695.0 %   mem: [80651, 110191, 110194] M
[DSK: Pass 1/1, Step 1: partitioning     ]  48   %   elapsed:  23 min 53 sec   remaining:  25 min 52 sec   cpu: 1677.5 %   mem: [57260, 110191, 110194] M
[DSK: Pass 1/1, Step 1: partitioning     ]  49   %   elapsed:  24 min 56 sec   remaining:  25 min 57 sec   cpu: 1639.9 %   mem: [45398, 110191, 110194] M
[DSK: Pass 1/1, Step 2: counting kmers   ]  50   %   elapsed:  27 min 38 sec   remaining:  27 min 38 sec   cpu: 1509.7 %   mem: [  99, 110191, 110194] MB

EXCEPTION: Pool reserve() failed for 629145600000 (+ 1600 extra for 36 cores) bytes

@rchikhi
Copy link
Member

rchikhi commented May 16, 2020

ahh thanks, there is something really odd.
What does the input look like? A multifasta file where each entry is a reference genome? What's the longest sequence in there?

@dpellow
Copy link
Author

dpellow commented May 16, 2020

Yes exactly - it's a multifasta of reference genomes.
The longest is 16Mbp

@rchikhi
Copy link
Member

rchikhi commented May 16, 2020

that sounds quite reasonable but I never tried that sort of counting. The k-mer counter is really optimized for counting reads, or single reference genomes, but not really collections of reference genomes of various lengths, and probably its guess for internal memory allocation parameters is thrown off by the various genome sizes. What -max-memory value did you pass, 500000?

@dpellow
Copy link
Author

dpellow commented May 16, 2020

Yes, I've tried 500GB-650GB, and also various values for max-disk.
Are there parameters that you think would get it to work? Or another assembler that you would recommend?

@rchikhi
Copy link
Member

rchikhi commented May 16, 2020

Yeah, I'm sorry it didn't work, I could look into it sometimes. Meanwhile you could give https://github.com/pmelsted/bifrost a try for outputting unitigs!

@dpellow
Copy link
Author

dpellow commented May 16, 2020

Thanks

@dpellow dpellow closed this as completed May 16, 2020
@rchikhi
Copy link
Member

rchikhi commented May 16, 2020

I'm leaving this open, would like to make bcalm able to process collections of genomes

@rchikhi rchikhi reopened this May 16, 2020
@dpellow
Copy link
Author

dpellow commented May 17, 2020

OK - It could be a good idea as a number of recent k-mer indexing/querying tools recommend using bcalm, not necessarily in the usual context of assembly of sequencing reads.
https://github.com/kamimrcht/REINDEER
https://github.com/Malfoy/Blight

@rchikhi
Copy link
Member

rchikhi commented May 17, 2020

REINDEER is not limited by the bug you point out, as it makes BCALM process one genome (= one color) after the other, i.e. smaller chunks at a time.

@pashadag
Copy link
Contributor

David, This may be a use-case for TwoPaCo:

https://github.com/medvedevgroup/TwoPaCo

@dpellow
Copy link
Author

dpellow commented May 17, 2020

David, This may be a use-case for TwoPaCo:

Thanks! I'll look into it.

@rchikhi
Copy link
Member

rchikhi commented May 20, 2020

I may have identified the issue with BCALM taking too much memory with many ref genomes. It can be temp-fixed by doing this change.

diff --git a/gatb-core/src/gatb/kmer/impl/SortingCountAlgorithm.cpp b/gatb-core/src/gatb/kmer/impl/SortingCountAlgorithm.cpp
index c87c6f8..3e05d36 100644
--- a/gatb-core/src/gatb/kmer/impl/SortingCountAlgorithm.cpp
+++ b/gatb-core/src/gatb/kmer/impl/SortingCountAlgorithm.cpp
@@ -1231,7 +1231,7 @@ void SortingCountAlgorithm<span>::fillPartitions (size_t pass, Iterator<Sequence
                        /** We launch the iteration of the sequences iterator with the created
                         * functors. */

-                       size_t groupSize = 1000;
+                       size_t groupSize = 10;
                        bool deleteSynchro = true;

(reduces mem by 5x on my end)
Will further study a more robust change, but meanwhile @dpellow you could try it if you'd like.

@rchikhi rchikhi changed the title Memory error Memory error with many reference genomes as input Oct 28, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants