-
Notifications
You must be signed in to change notification settings - Fork 28
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Long run time with *_CNV_CALLS_pre_filtered.bed #35
Comments
Hi Weijia, The advice about more than >50 entries in the seeds file applies to the AA_CNV_SEEDS.bed file. It would appear the problem though is the Jens |
Thank you, Jens. I followed your suggestions and changed the --cnsize_min to 10000, and these several samples have been running for 2 days (from the bam inputs) without any output. Their SEEDS.bed files have 0-15 entries. Not sure why only these samples are running particularly longer than others. |
Hi Weijia, Are the samples producing any outputs in log files or stdout? If there is no logging output then there may be a problem. We've found that using anything above 40-50x coverage has little impact on discovered ecDNA, and almost always 5-10x is perfectly adequate unless the ecDNA are highly subclonal. AA's threshold for SV detection scales with baseline coverage, so you are likely not missing critical reads by downsampling, since the original coverage will have a higher threshold to discover the SVs anyways. I recommend that you try leaving the original downsample parameter for your long run-time samples to see if that helps.
Leaving the AA parameters as default at first is highly recommended, then you can customize them further if you feel there are things AA is missing. For maybe 5% of samples we usually encounter, runtimes are extreme because of the complex nature of the focal amplfication. You may need to wait a week or so for a few "bad" samples to finish. |
That is very reasonable. Thank you for your help! |
Hi Jens,
I am using AmpliconSuite for a set of cancer WGS data. I have 36 samples, most of them finished successfully. But there are 10 of them that have been running for 5 days. And still not finished yet. I checked the *_CNV_CALLS_pre_filtered.bed. I think for these ones, they have 50-100 entries in the bed files. I wonder if this was the problem. I read the README, it says "if you notice there are > 50 CNV seeds going into AA, there may be something wrong." I assume, the bed files are a little large, but <100 entries are still on a reasonable scale?
If this is the issue, do you think it is ok to re-run AA for these 10 samples and split their bed files into two (so there are <50 entries)?
My command line is:
$AASuite"PrepareAA.py" -s $name -t 32 --cnvkit_dir /anaconda3/bin/cnvkit.py --fastqs$name"_R1.fastq.gz" $ $name"_R2.fastq.gz" --ref hg38 --cnsize_min 500 --downsample -1 --run_AA --run_AC
I used the same command line for all 36 samples. And all the samples have similar fastq sizes as input.
Thanks for your help.
Weijia
The text was updated successfully, but these errors were encountered: