Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

'Realloc' could not re-allocate memory #100

Closed
helaersr opened this issue Nov 24, 2021 · 7 comments
Closed

'Realloc' could not re-allocate memory #100

helaersr opened this issue Nov 24, 2021 · 7 comments

Comments

@helaersr
Copy link

Hi,

I have some issue using QDNAseq with high coverage WGS (~300x).

Using this command:
readCounts <- binReadCounts(bins, bamfiles=bam)

throws this error

Loading Libraries...Loaded bin annotations for genome ‘hg38’, bin size 1 kbp, and experiment type ‘SR50’ from annotation package QDNAseq.hg38 v1.0.0
Calling QDNAseq...    sample (1 of 1): extracting reads ...Error in value[[3L]](cond) : 
  'Realloc' could not re-allocate memory (18446744065128005632 bytes)
  file: sample.bam
  index: NA
Calls: binReadCounts ... tryCatch -> tryCatchList -> tryCatchOne -> <Anonymous>
Execution halted

I never encountered this error with my lower coverage WGS (e.g. 50x), but it happens systematically with all the high coverages.

I don't think it's a memory issue. The job has 200 GB of RAM available, and I monitored it: it grows to ~11 GB of RAM used before crashing. I also don't think it tries to allocate 16 EB of RAM :-), so the error message is probably misleading.

I didn't looked much in the code, but I suspect a problem linked to the number of reads which is greater than a 32 bits integer. With 50x WGS we have something around 1 billion reads (smaller than the max int value of 2147483647), while with 300x WGS it's more than 6 billions reads. So if 32 bits integers are used for e.g. storing the number of reads, I suppose we would have this kind of trouble.

What do you think can cause this issue ?

@HenrikBengtsson
Copy link
Collaborator

... 16 EB of RAM ...

Yeah, if so, we'll need to wait a decade or two :p

What do you think can cause this issue ?

I'm not sure exactly where this happens, but I suspect we're running out of memory in Rsamtools::scanBam().

If you specify chunkSize as in:

readCounts <- binReadCounts(bins, bamfiles = bam, chunkSize = 10e6)

then each BAM file will be read in chunks on 10 million basepairs, instead of the whole genome at once. Does that make a difference for you?

BTW, what's your sessionInfo()?

@helaersr
Copy link
Author

sessionInfo() :

R version 4.1.0 (2021-05-18)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Ubuntu 16.04.5 LTS

Matrix products: default
BLAS:   /usr/lib/libblas/libblas.so.3.6.0
LAPACK: /usr/lib/lapack/liblapack.so.3.6.0

locale:
 [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C
 [3] LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8
 [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8
 [7] LC_PAPER=en_US.UTF-8       LC_NAME=C
 [9] LC_ADDRESS=C               LC_TELEPHONE=C
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base

other attached packages:
[1] QDNAseq.hg38_1.0.0 QDNAseq_1.28.0

loaded via a namespace (and not attached):
 [1] parallelly_1.27.0      rstudioapi_0.13        DNAcopy_1.66.0
 [4] XVector_0.32.0         GenomicRanges_1.44.0   BiocGenerics_0.38.0
 [7] zlibbioc_1.38.0        IRanges_2.26.0         BiocParallel_1.26.1
[10] impute_1.66.0          GenomeInfoDb_1.28.1    globals_0.14.0
[13] tools_4.1.0            CGHcall_2.54.0         parallel_4.1.0
[16] Biobase_2.52.0         R.oo_1.24.0            marray_1.70.0
[19] matrixStats_0.59.0     digest_0.6.27          CGHbase_1.52.0
[22] crayon_1.4.1           GenomeInfoDbData_1.2.6 BiocManager_1.30.16
[25] codetools_0.2-18       R.utils_2.10.1         S4Vectors_0.30.0
[28] bitops_1.0-7           RCurl_1.98-1.3         future.apply_1.7.0
[31] limma_3.48.1           compiler_4.1.0         R.methodsS3_1.8.1
[34] Rsamtools_2.8.0        Biostrings_2.60.1      stats4_4.1.0
[37] future_1.21.0          listenv_0.8.0

When I try to use chunkSize = 10e6, I get this error:

Error in getGlobalsAndPackages(expr, envir = envir, globals = globals) :
  Did you mean to create the future within a function?  Invalid future expression tries to use global '...' variables that do not exist: FUN()
Calls: binReadCounts ... getGlobalsAndPackagesXApply -> getGlobalsAndPackages
Execution halted

@HenrikBengtsson
Copy link
Collaborator

When I try to use chunkSize = 10e6, I get this error:

Error in getGlobalsAndPackages(expr, envir = envir, globals = globals) :
  Did you mean to create the future within a function?  Invalid future expression tries to use global '...' variables that > do not exist: FUN()
Calls: binReadCounts ... getGlobalsAndPackagesXApply -> getGlobalsAndPackages
Execution halted

Update the future package to fix this. But, I'd say update all your packages because several of them have been updated since you installed them.

@helaersr
Copy link
Author

Yes indeed it works better with up-to-date libraries ^_^ !

And dividing it in chunks did the trick, it works now fine.

Thank you very much for your help !

@HenrikBengtsson
Copy link
Collaborator

Thanks for confirming. So, I'm thinking of make read-in-chunks the new default to avoid these hiccups (#101). To do this, we need to set a sensible default for chunkSize.

  1. Did you end up using chunkSize = 10e6, or did you have to decrease that?
  2. What's the file sizes of those high coverage WGS (~300x) BAM files you used here?

Knowing that will help figuring out a sensible default.

@helaersr
Copy link
Author

  1. Yes, I used chunkSize = 10e6
  2. ~500 GB per BAM. I have 8 BAMs of this size, and all of them worked using chunkSize = 10e6

@HenrikBengtsson
Copy link
Collaborator

HenrikBengtsson commented Nov 26, 2021

Thanks - this is helpful.

Back of the envelope calculation:

  1. chunkSize = 10e6 on the human genome gives 3e9 / 10e6 = 300 chunks
  2. 300 chunks on 500 GB BAM file is ~1.7 GB per chunk

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants