GenotypeGVCFs memory issues on GATK 4.6.0.0 #8918

jin0008 · 2024-07-17T23:09:16Z

Bug Report

Affected tool(s) or class(es)

GenotypeGVCFs

Affected version(s)

4.6.0.0

Description

When I was doing GenotypeGVCFs from GenomicsDB of 420 samples, the process interrupted due to significant memory issues. This process was eating up memory continuously. In 4.5.0.0, I did same process, and I confirmed it works fine.

gokalpcelik · 2024-07-18T15:08:28Z

Can you provide your logs that shows the error message?

jin0008 · 2024-07-19T09:56:06Z

There are no error messages. The process was interrupted without any error messages. I attached the screenshot. I attached chr14 variant calling (completed) and chr14 variant calling (interrupted). In the system monitor, when I am using GATK 4.6.0.0., they are eating up memory continuously. When they are reaching up to 512Gb, the process was interrupted. I tried to run this process on only 2-3 chromosomes, and I found that the process was completed on chr 14, and the process was interrupted on the rest of two chromosomes (interval -L). So I rolled back to GATK 4.5.0.0, the process was normal. I can do GenotypeGVCFs command entire chromosome simultaneously. My machine has 512Gb memory and 64 cores (5995wx AMD threadripper) dell 7865 workstation. Thanks Jinu Han

…

On Fri, Jul 19, 2024 at 12:08 AM Gökalp Çelik ***@***.***> wrote: Can you provide your logs that shows the error message? — Reply to this email directly, view it on GitHub <#8918 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AG7IXWWGPB73BXPN4Z5E4VTZM7LAFAVCNFSM6AAAAABLBRETECVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDEMZWHAYTSMJRGM> . You are receiving this because you authored the thread.Message ID: ***@***.***>

gokalpcelik · 2024-07-19T10:01:06Z

Can you provide more details on what operating system you are using and other related information such as java version etc?

Even if the process gets interrupted by the system there must be a java segfault message at some point thrown by the process. Did you observe any files with names ERR around the output file?

jin0008 · 2024-07-19T10:08:44Z

Hi, The operating system is ubuntu 20.04. java version is openjdk "17.0.11". If the process of GATK best practice has been interrupted, I could see the error messages always. But, in this time, the process was interrupted without giving any messages. This is quite weird. I checked this several other chromosomes. My callset has about 430 samples. I could run GenotypeGVCFs in GATK 4.5.0.0 version without any problem. But, in GATK 4.6.0.0, the process was successful in 3-4 chromosomes (which is smaller one I think). The process has been interrupted in incomplete stages. I could not find any ERR files in the folder. Thanks Jinu Han

…

On Fri, Jul 19, 2024 at 7:01 PM Gökalp Çelik ***@***.***> wrote: Can you provide more details on what operating system you are using and other related information such as java version etc? Even if the process gets interrupted by the system there must be a java segfault message at some point thrown by the process. Did you observe any files with names ERR around the output file? — Reply to this email directly, view it on GitHub <#8918 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AG7IXWSQYT56QW4Q4YCZUPTZNDPXPAVCNFSM6AAAAABLBRETECVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDEMZYHAYTIMZVHA> . You are receiving this because you authored the thread.Message ID: ***@***.***>

gokalpcelik · 2024-07-22T16:47:29Z

Can you tell us how much is your heap size for this task? (-Xmx? -Xms?)

icemduru · 2024-08-12T10:31:31Z

i have a similar issue. Weirdly -Xmx does not help.

gokalpcelik · 2024-08-12T10:34:43Z

@icemduru
Can you provide more details on your issue? How many samples do you have? How did you combine them and what are your command lines for this process?
Can you provide more details on the system that you are running these commands on?

GenotypeGVCFs is not known to have memory leak issues. Our tests indicated that it only needs around 4~6GBs of total memory to genotype 120 whole genome samples (Per contig).

icemduru · 2024-08-12T10:51:27Z

@icemduru Can you provide more details on your issue? How many samples do you have? How did you combine them and what are your command lines for this process? Can you provide more details on the system that you are running these commands on?

GenotypeGVCFs is not known to have memory leak issues. Our tests indicated that it only needs around 4~6GBs of total memory to genotype 120 whole genome samples (Per contig).

Thanks for reply. I have 370 samples. I have run HaplotypeCaller for each of them. Then run GenomicsDBImport for each of the chromosome (it is a plant genome, about 420 mb in total genome size). Then tried to run GenotypeGVCFs for each chromosome. I attached the log file for chr1.
slurm-22616776.out_text.txt

gokalpcelik · 2024-08-12T11:16:56Z

Hi @icemduru
Looks like your slurm workload manager was configured to have a limit of 48GBs of maximum process memory size per execution. Your java instance is set with -Xmx45G which will cover most of this limit and leaves only a handful of memory space for the native GenomicsDB library. Native libraries work above the heapsize so it is better for you to set your -Xmx to a more sensible size of 8~12GB and leave rest of the memory space to the native library to use.

Keep in mind that this memory limit on slurm could be set per user not per task therefore you may need to run a single contig at a time or maybe 2 of them simultaneously. Otherwise slurm may interefere with all the tasks and cancel all your jobs.

One final reminder. We strongly recommend users to set the temporary directory to somewhere else other than /tmp. Slurm workload manager interferes with this preference and sometimes results in premature termination of the gatk processes due to deletion of extracted native library and accessory files.

I hope this helps.

icemduru · 2024-08-14T06:18:17Z

Hi @icemduru Looks like your slurm workload manager was configured to have a limit of 48GBs of maximum process memory size per execution. Your java instance is set with -Xmx45G which will cover most of this limit and leaves only a handful of memory space for the native GenomicsDB library. Native libraries work above the heapsize so it is better for you to set your -Xmx to a more sensible size of 8~12GB and leave rest of the memory space to the native library to use.

Keep in mind that this memory limit on slurm could be set per user not per task therefore you may need to run a single contig at a time or maybe 2 of them simultaneously. Otherwise slurm may interefere with all the tasks and cancel all your jobs.

One final reminder. We strongly recommend users to set th
slurm-22680938.out_text.txt
e temporary directory to somewhere else other than /tmp. Slurm workload manager interferes with this preference and sometimes results in premature termination of the gatk processes due to deletion of extracted native library and accessory files.

I hope this helps.

Thank you for your help, but unfortunately it didn't resolve the issue. I've already tried allocating 10GB of memory using the -Xmx10g flag and redirecting the temporary directory away from /tmp. However, GATK is still attempting to consume more than 48GB of RAM, resulting in the termination of my run.
slurm-22680938.out_text.txt

gokalpcelik · 2024-08-14T14:05:52Z

Hi again.
Did you add the --consolidate true parameter to GenomicsDBImport during importing stage? It is a step which collapses each layer of import into a single layer which prevents tools to open too many files at once but it may also take sometime at the end of the importing stage. It also reduces the amount of book keeping to be done by the genotyper.

icemduru · 2024-08-22T06:49:45Z

Hi again. Did you add the --consolidate true parameter to GenomicsDBImport during importing stage? It is a step which collapses each layer of import into a single layer which prevents tools to open too many files at once but it may also take sometime at the end of the importing stage. It also reduces the amount of book keeping to be done by the genotyper.

Hi,
Thanks for the suggestion. I have used the --consolidate true parameter to GenomicsDBImport during importing stage. However, it did not help. But I solved my problem using large memory machines. For future reference, required memory was 95.11 GB for 370 samples dataset using -Xmx8G and --disable-bam-index-caching true.

Wangchangsh · 2024-10-22T09:31:45Z

Same problem. Any solution or update?

gokalpcelik · 2024-10-22T21:23:10Z

Hi @Wangchangsh
Yes there is an update for this issue. We were able to recreate this problem in our hands and looks like there is a memory management issue somewhere in the GenomicsDB related code inside GenotypeGVCFs.

Our temporary solution until we make an updated release would be to convert imported genomicsDB instances to GVCF using

gatk SelectVariants -V gendb://instancename -O GVCF_export.g.vcf.gz -R ref.fa -L whateverintervalusedinGDBimport

and later using this GVCF file as input for the GenotypeGVCFs tool. This ensures that memory usage won't go above unreasonable levels and won't cause any appearant leaks.

I hope this helps.

Regards.

Wangchangsh · 2024-10-23T02:23:47Z

Thank you for your prompt response. I used the -L parameter to split tasks into Mb-level to prevent memory issues.

broadinstitute deleted a comment from SaarGirl Aug 2, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

GenotypeGVCFs memory issues on GATK 4.6.0.0 #8918

GenotypeGVCFs memory issues on GATK 4.6.0.0 #8918

jin0008 commented Jul 17, 2024

gokalpcelik commented Jul 18, 2024

jin0008 commented Jul 19, 2024 via email

gokalpcelik commented Jul 19, 2024

jin0008 commented Jul 19, 2024 via email

gokalpcelik commented Jul 22, 2024

icemduru commented Aug 12, 2024

gokalpcelik commented Aug 12, 2024

icemduru commented Aug 12, 2024

gokalpcelik commented Aug 12, 2024

icemduru commented Aug 14, 2024

gokalpcelik commented Aug 14, 2024

icemduru commented Aug 22, 2024

Wangchangsh commented Oct 22, 2024

gokalpcelik commented Oct 22, 2024

Wangchangsh commented Oct 23, 2024

GenotypeGVCFs memory issues on GATK 4.6.0.0 #8918

GenotypeGVCFs memory issues on GATK 4.6.0.0 #8918

Comments

jin0008 commented Jul 17, 2024

Bug Report

Affected tool(s) or class(es)

Affected version(s)

Description

gokalpcelik commented Jul 18, 2024

jin0008 commented Jul 19, 2024 via email

gokalpcelik commented Jul 19, 2024

jin0008 commented Jul 19, 2024 via email

gokalpcelik commented Jul 22, 2024

icemduru commented Aug 12, 2024

gokalpcelik commented Aug 12, 2024

icemduru commented Aug 12, 2024

gokalpcelik commented Aug 12, 2024

icemduru commented Aug 14, 2024

gokalpcelik commented Aug 14, 2024

icemduru commented Aug 22, 2024

Wangchangsh commented Oct 22, 2024

gokalpcelik commented Oct 22, 2024

Wangchangsh commented Oct 23, 2024