-
Notifications
You must be signed in to change notification settings - Fork 592
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
GenotypeGVCFs memory issues on GATK 4.6.0.0 #8918
Comments
Can you provide your logs that shows the error message? |
There are no error messages.
The process was interrupted without any error messages.
I attached the screenshot.
I attached chr14 variant calling (completed) and chr14 variant calling
(interrupted).
In the system monitor, when I am using GATK 4.6.0.0., they are eating up
memory continuously.
When they are reaching up to 512Gb, the process was interrupted.
I tried to run this process on only 2-3 chromosomes, and I found that the
process was completed on chr 14, and the process was interrupted on the
rest of two chromosomes (interval -L).
So I rolled back to GATK 4.5.0.0, the process was normal. I can do
GenotypeGVCFs command entire chromosome simultaneously.
My machine has 512Gb memory and 64 cores (5995wx AMD threadripper) dell
7865 workstation.
Thanks
Jinu Han
…On Fri, Jul 19, 2024 at 12:08 AM Gökalp Çelik ***@***.***> wrote:
Can you provide your logs that shows the error message?
—
Reply to this email directly, view it on GitHub
<#8918 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AG7IXWWGPB73BXPN4Z5E4VTZM7LAFAVCNFSM6AAAAABLBRETECVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDEMZWHAYTSMJRGM>
.
You are receiving this because you authored the thread.Message ID:
***@***.***>
|
Can you provide more details on what operating system you are using and other related information such as java version etc? Even if the process gets interrupted by the system there must be a java segfault message at some point thrown by the process. Did you observe any files with names ERR around the output file? |
Hi,
The operating system is ubuntu 20.04.
java version is openjdk "17.0.11".
If the process of GATK best practice has been interrupted, I could see the
error messages always.
But, in this time, the process was interrupted without giving any messages.
This is quite weird. I checked this several other chromosomes.
My callset has about 430 samples.
I could run GenotypeGVCFs in GATK 4.5.0.0 version without any problem.
But, in GATK 4.6.0.0, the process was successful in 3-4 chromosomes (which
is smaller one I think). The process has been interrupted
in incomplete stages.
I could not find any ERR files in the folder.
Thanks
Jinu Han
…On Fri, Jul 19, 2024 at 7:01 PM Gökalp Çelik ***@***.***> wrote:
Can you provide more details on what operating system you are using and
other related information such as java version etc?
Even if the process gets interrupted by the system there must be a java
segfault message at some point thrown by the process. Did you observe any
files with names ERR around the output file?
—
Reply to this email directly, view it on GitHub
<#8918 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AG7IXWSQYT56QW4Q4YCZUPTZNDPXPAVCNFSM6AAAAABLBRETECVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDEMZYHAYTIMZVHA>
.
You are receiving this because you authored the thread.Message ID:
***@***.***>
|
Can you tell us how much is your heap size for this task? (-Xmx? -Xms?) |
i have a similar issue. Weirdly -Xmx does not help. |
@icemduru GenotypeGVCFs is not known to have memory leak issues. Our tests indicated that it only needs around 4~6GBs of total memory to genotype 120 whole genome samples (Per contig). |
Thanks for reply. I have 370 samples. I have run HaplotypeCaller for each of them. Then run GenomicsDBImport for each of the chromosome (it is a plant genome, about 420 mb in total genome size). Then tried to run GenotypeGVCFs for each chromosome. I attached the log file for chr1. |
Hi @icemduru Keep in mind that this memory limit on slurm could be set per user not per task therefore you may need to run a single contig at a time or maybe 2 of them simultaneously. Otherwise slurm may interefere with all the tasks and cancel all your jobs. One final reminder. We strongly recommend users to set the temporary directory to somewhere else other than /tmp. Slurm workload manager interferes with this preference and sometimes results in premature termination of the gatk processes due to deletion of extracted native library and accessory files. I hope this helps. |
Thank you for your help, but unfortunately it didn't resolve the issue. I've already tried allocating 10GB of memory using the -Xmx10g flag and redirecting the temporary directory away from /tmp. However, GATK is still attempting to consume more than 48GB of RAM, resulting in the termination of my run. |
Hi again. |
Hi, |
Same problem. Any solution or update? |
Hi @Wangchangsh Our temporary solution until we make an updated release would be to convert imported genomicsDB instances to GVCF using
and later using this GVCF file as input for the GenotypeGVCFs tool. This ensures that memory usage won't go above unreasonable levels and won't cause any appearant leaks. I hope this helps. Regards. |
Thank you for your prompt response. I used the -L parameter to split tasks into Mb-level to prevent memory issues. |
Bug Report
Affected tool(s) or class(es)
GenotypeGVCFs
Affected version(s)
4.6.0.0
Description
When I was doing GenotypeGVCFs from GenomicsDB of 420 samples, the process interrupted due to significant memory issues. This process was eating up memory continuously. In 4.5.0.0, I did same process, and I confirmed it works fine.
The text was updated successfully, but these errors were encountered: